After each overnight run, review results.tsv and update program.md for the next run.
If your agent explored unproductive areas, tighten constraints. "CAN modify model architecture freely" becomes "CAN modify DEPTH and HIDDEN_DIM only."
If your agent was too conservative, broaden the keep criteria. "Keep only if val_bpb improves" becomes "Keep if val_bpb is within % of best AND introduces a simpler architecture."
If certain changes worked, add that knowledge. "Prior runs show layers outperform . Focus on variations with layers as the baseline."
Each version of program.md is a hypothesis about what research directions will produce the fastest progress. Your overnight run tests that hypothesis.