You've now seen the full picture. In Section , you learned what AutoResearch is: a -line loop that lets an agent run ML experiments overnight. In Section , you followed the agent's decision-making cycle: read, hypothesize, edit, train, measure, keep or revert. In Section , you learned to write program.md to control the loop. In Section , you scaled from GPU to .
In this section, you saw the real results, the failure modes (Goodhart's Law, seed gaming, transfer uncertainty), and the ecosystem (AI Scientist-v, AIDE, Robin, DSPy). The pattern is the same everywhere: propose, test, measure, decide.
Your next step: clone the repository, pick a metric for your own codebase, write your first program.md, and run your first overnight session. When you wake up, open results.tsv.