Save checkpoints regularly during training. You'll want them for:
- Resuming if training crashes
- Selecting the best model (not always the final one)
- Experimenting with different stopping points
Save every steps and keep the last checkpoints. Also save whenever validation loss improves. Disk space is cheap compared to retraining.