You now understand how supervised fine-tuning works and how to execute it effectively.
Key takeaways:
- SFT continues next-token prediction on your data
- Learning rates for fine-tuning are much smaller than pre-training
- Monitor validation loss to catch overfitting
- Catastrophic forgetting is real and preventable
- Checkpoint often and log everything
Next, I'll teach you parameter-efficient methods like LoRA that reduce memory and training time.