TGI (Text Generation Inference) is Hugging Face's production inference server.
Advantages:
- Tight Hugging Face integration
- Tensor parallelism for large models
- Flash Attention included
- Production-ready with Docker images
Choose TGI if you're already in the Hugging Face ecosystem and want straightforward deployment.