The AutoResearch pattern works beyond ML training. The requirements are the same:
- A single metric to optimize (replaces val_bpb)
- A fixed time budget per experiment (replaces the -second cap)
- Clear file boundaries (what the agent can and cannot modify)
- An evaluation script that produces a numeric score
Tobi Lutke applied this to Shopify's Liquid template engine, a Ruby codebase with zero ML. His metric was combined_us (parse plus render time in microseconds). His evaluation script ran unit tests, then a benchmark suite against real Shopify theme templates. Same pattern. Different domain.