KTO (Kahneman-Tversky Optimization) doesn't need paired preferences. It works with single responses labeled as good or bad.
This is simpler data to collect. You don't need direct comparisons, just thumbs up or thumbs down.
KTO matches DPO performance in many settings. Use it when you have ratings but not pairwise comparisons.