Common problems and fixes:
Loss not decreasing:
- Learning rate too low (increase it)
- Data issue (check formatting)
Loss spikes or NaN:
- Learning rate too high (reduce it)
- Enable gradient clipping
Out of memory:
- Reduce batch size
- Enable gradient checkpointing
- Switch to parameter-efficient method