The default schedule has no warmup (warmup_ratio = 0.0) and % cooldown (warmdown_ratio = 0.5).
Agents have found that adding % warmup improved val_bpb by . Going to % warmup degraded performance. Linear cooldown beat cosine cooldown. Many ML codebases default to cosine, but in AutoResearch's -minute window, linear works better. Extending cooldown to % or % of training provided the best convergence.
These findings are fragile. One session found % warmup optimal. A follow-up session with different prior changes could not reproduce the gain. Schedule tuning interacts with every other change your agent has made.