The SkyPilot run spent under \3002.87$% improvement. But the gains weren't evenly distributed.
Phase (hyperparameters) produced a val_bpb drop. That's the highest-impact work and it happened in the first hours, costing roughly \6740.0012$6722$x less improvement.
This is the general pattern. Early scaling pays off. Late scaling burns money. If you're budget-constrained, run GPUs for hours instead of GPUs for . You'll capture most of the gains at a quarter of the cost.