Options for getting preference data:
- Human annotation: Highest quality, expensive and slow
- AI feedback: Use a strong model (GPT-4) to judge weaker model outputs
- Synthetic generation: Generate both chosen and rejected from different models
- Public datasets: UltraFeedback, HH-RLHF, OpenAssistant
Start with public datasets. Add custom data for domain-specific alignment.