Common mistakes to avoid:
- Aligning a weak SFT model. Fix capability first.
- Using low-quality preference data. Garbage preferences teach garbage behavior.
- Over-aligning. Too much preference training can make models sycophantic or overly cautious.
- Ignoring evaluation. Preference loss doesn't tell the whole story.
Alignment is refinement, not rescue. Start with a capable model.