You saw in Section that agents struggle with creative hypothesis generation. What would it take to fix that?
Your agent would need to read and synthesize papers, not just code. It would need a model of why techniques work, not just whether they improve the metric. It would need to transfer findings across domains, recognizing that a trick from computer vision might apply to NLP.
None of these exist in AutoResearch today. Your agent proposes changes based on pattern matching against code it has seen during training. That's why program.md matters. You supply the theory and research direction. Your agent supplies the execution speed.