You can create preference data using AI:
Generate multiple responses from your SFT model
Use a judge model (GPT-4, Claude) to rank them
Create pairs from rankings
This is called RLAIF (RL from AI Feedback). It scales better than human annotation. Quality depends on your judge model's capabilities.