Latency and throughput often trade off:
- Small batches: Low latency, low throughput
- Large batches: High latency, high throughput
Continuous batching helps balance both. New requests join in-progress batches without waiting for completion.
vLLM and TGI both support continuous batching out of the box.