Key metrics to monitor:
Request rate and latency (p50, p95, p99)
Error rates (xx client, xx server)
Active connections and connection rate
Backend health (healthy vs unhealthy count)
Spillover count (requests rejected due to capacity)
Set alerts on error rate spikes and latency degradation.