The Karpathy Loop has hard requirements. If any one is missing, the system breaks down.
A single file the agent can edit (train.py). You need one clear target. Multiple editable files create ambiguity about what changed and why.
A single objective metric that is numeric and objectively testable (val_bpb). Without a numeric score, your agent has no way to compare one experiment to another.
A fixed time budget per experiment (default: minutes). Every experiment gets the same clock. This makes results comparable across runs.