Your agent will hit errors. Two types come up most often.
CUDA errors happen when your agent tries a configuration that exceeds GPU memory or triggers a driver-level fault. NaN losses appear when training becomes numerically unstable, often from a learning rate that's too high or an architecture that doesn't converge.
Both are handled automatically. Your agent catches the error, logs the failure in results.tsv, reverts the change with git reset, and moves on to the next experiment. You don't need to intervene. Your agent treats errors the same way it treats a failed experiment: log it and try something else.