Fine-tuning isn't free. Expect:
- QLoRA on 7B: 1-4 hours on RTX 4090, ~5-20 cloud
- LoRA on 13B: 4-8 hours on A100, ~20-50 cloud
- Full fine-tune 70B: 24+ hours on 8x A100, ~500+ cloud
Budget 3-5x your first estimate for data prep and iteration.
Fine-tuned models also require maintenance. Base models update. Your data evolves. Performance degrades. RAG systems update easily; fine-tuned models need full retraining. Consider if RAG solves your problem with less ongoing cost.