Filter criteria for instruction data:
- Minimum response length (remove one-word answers unless appropriate)
- Maximum response length (remove overly verbose responses)
- Response quality scores (use an LLM to rate quality)
- Instruction clarity (vague instructions produce vague learning)
Start strict. It's easier to add filtered examples back than to debug model behavior caused by bad data.