ORPO (Odds Ratio Preference Optimization) combines SFT and alignment in one step. It adds a preference term to the standard language modeling loss.
No reference model needed. No separate SFT stage required.
ORPO can be more efficient: one training run instead of two. Quality matches DPO in most benchmarks. Consider ORPO when you want a simpler pipeline.