Thresholds define when metrics become problems. Set them wrong and you get alert storms or silent failures.
Static thresholds:
Fixed values like "alert when CPU exceeds %." Simple but may not fit variable workloads.
Dynamic thresholds:
Based on historical patterns. Adapts to baseline behavior.
Duration requirements:
"Alert when CPU exceeds % for minutes." Filters momentary spikes.
Starting points:
- Interface utilization: warn at %, critical at %
- Packet loss: warn at %, critical at %
- Response time: warn at x baseline, critical at x