Every experiment your agent runs becomes a git commit. If the experiment improves val_bpb, the commit stays. If it doesn't, your agent runs git reset to revert the change.
This means your git history is your complete experiment log. Each commit message describes what your agent tried. The surviving commits are the improvements. The reverted ones are the dead ends.
You can walk through the full history with git log. You don't need a separate experiment tracking tool. Git is your ledger.