Transient failures are temporary. Retries often succeed:
Exponential backoff: Wait s, then s, then s. Prevents thundering herd.
Jitter: Add randomness. If clients retry at exactly s, you get a spike. Random jitter spreads load.
wait_time = min(base * 2^attempt + random(0, 1000ms), max_wait)
When to retry: Network timeouts, 503 errors, rate limits. Don't retry 400 errors or business logic failures.
Circuit breaker integration: Stop retrying if the circuit is open.