IPO (Identity Preference Optimization) modifies DPO to reduce reference model dependence.
DPO can overfit to the reference model's specific outputs. IPO uses a different loss formulation that's more stable.
In practice, IPO and DPO perform similarly on most tasks. Try IPO if you see issues with DPO's reference dependence.