Learning rate usually changes during training:
- Warmup: Start low, ramp up over first -% of training
- Peak: Reach target learning rate
- Decay: Gradually decrease to near zero
Cosine decay is popular. Linear decay also works. The warmup prevents early instability. The decay helps convergence at the end.