Here's the exact loop your agent follows:
Read train.py and program.md.
Form a hypothesis about what change might lower val_bpb.
Edit train.py.
Commit the change via git.
Run the training script: uv run train.py > run.log 2>&1
Extract metrics: grep "^val_bpb:\|^peak_vram_mb:" run.log
Log the outcome in results.tsv.
If val_bpb improved: keep the commit. If not: git reset to revert.
Go back to step . Repeat indefinitely.