A good run shows diverse experiment types in results.tsv. Architecture changes, optimizer tuning, schedule adjustments. The keep rate is high in early hours and tapers off. val_bpb drops steadily.
A wasted run shows the agent stuck in one area. Twenty experiments tweaking the same learning rate by increments. Or repeated crashes from configurations that exceed VRAM. Or the agent trying to modify prepare.py and failing.
The difference is almost always program.md. A clear objective, well-defined boundaries, and enough room for the agent to explore. If your run is wasted, the fix is in your instructions, not in the agent.