Open results.tsv. Each row has columns: commit hash, val_bpb, memory in GB, status ("keep", "discard", or "crash"), and description.
A healthy run shows "keep" entries mixed with "discard" entries. The val_bpb on kept experiments drops by or more. Descriptions show diverse experiment types: architecture changes, optimizer tuning, schedule adjustments.
A stuck agent shows long "discard" streaks. Descriptions become repetitive: "adjusted LR from to ", then "adjusted LR from to ." Kept experiments show val_bpb deltas smaller than . Crash entries increase as the agent tries increasingly aggressive changes.