When you wake up, open results.tsv. Each row records one experiment your agent ran.
A row contains columns: the commit hash, the val_bpb score, peak memory in GB, the status ("keep", "discard", or "crash"), and the experiment description. To spot improvements, look for rows where val_bpb decreased compared to the previous best.
Here's what others have seen: Karpathy got additive improvements and an % training speedup on already-optimized nanochat code. Tobias Lütke (Shopify CEO) ran experiments overnight and got a % performance gain. Community Discussion # tracked val_bpb dropping from to across experiments.