Your AutoResearch agent follows a pattern called ReAct: Reason, Act, Observe, Repeat.
In each cycle, the agent reasons by reading train.py, program.md, and results.tsv. It acts by editing train.py and running training. It observes by extracting val_bpb from the logs. Then it repeats.
Standard ReAct involves multi-step planning, tool orchestration, and rollback strategies. AutoResearch strips all of that out. One code edit, one training run, one binary decision. No multi-step plans. No complex tool chains. Just propose, test, decide, repeat.