Never deploy without evaluation. Test your model on:
- Held-out test set: Data the model never saw during training
- Task-specific benchmarks: Measure actual performance
- Regression tests: Ensure you didn't break capabilities
- Safety tests: Verify the model refuses harmful requests
A model that looks good in training can fail in production. Evaluate thoroughly.