With GPUs, the SkyPilot team ran experiments in hours. Throughput was not the constraint. The agent could test ideas faster than it could generate good ones.
This is the new bottleneck: hypothesis quality. After the obvious changes (batch size, depth, learning rates), your agent needs creative ideas. But your agent tends to repeat patterns. It tries the same category of change with slight variations. It doesn't generate the kind of lateral thinking that leads to novel approaches.
The constraint has shifted. Your GPU can run experiments per hour. Your agent can't propose good hypotheses per hour. The bottleneck is now the quality of ideas, not the speed of testing.