In distributed systems, tail latency gets amplified:
If a request touches services, each with P99 latency of ms (meaning % complete within ms):
- Probability all respond within ms: = %
- % of requests wait for the slowest service
With services: = %
This is why microservices architectures obsess over P99 latency. One slow service affects every request that touches it.
Mitigations:
- Hedged requests (send to multiple replicas, take fastest)
- Timeouts and circuit breakers
- Caching to avoid slow paths