After training finishes, your agent extracts two numbers from run.log:
grep "^val_bpb:\|^peak_vram_mb:" run.log
If grep returns values, training completed and your agent has a val_bpb score to compare. If grep returns empty, the run crashed. Your agent then reads the last lines of run.log for the stack trace.
Both numbers get logged in results.tsv. The val_bpb score determines whether the experiment is kept or reverted. The VRAM reading helps your agent avoid GPU memory limits in future experiments.