Many requests share the same system prompt or few-shot examples. Prefix caching stores the KV cache for common prefixes and reuses it across requests.
If your system prompt is tokens, every request would normally recompute those tokens. With prefix caching, you compute once and reuse.
vLLM and SGLang support prefix caching. Enable it when you have a shared system prompt. The speedup is proportional to prefix length.