You learned the requirements for the loop in Section : a numeric metric, a time budget, file boundaries, and an automated evaluation script. Here's where people go wrong when applying them outside ML.
Picking a metric that's cheap to game. If your metric is "test pass rate," your agent can delete tests. Pin your test suite in a file the agent can't edit.
Setting a time budget that doesn't match the feedback cycle. A -minute budget works for training. For a compiler optimization loop, you might need seconds. For integration tests, you might need minutes. Match the budget to how long your evaluation takes.
Giving the agent too many files. Shopify scoped edits to lib/liquid/*.rb. If you let your agent edit your entire repository, it will make changes you can't review.