TRL (Transformer Reinforcement Learning) handles training loops:
- SFTTrainer: Supervised fine-tuning with best practices built in
- DPOTrainer: DPO alignment with reference model handling
- ORPOTrainer, KTOTrainer: Other alignment methods
TRL manages complexity. Data collation, loss computation, logging, checkpointing. You focus on config, TRL handles execution.