You now understand observability at the level expected in SRE interviews.
What to remember:
- Three pillars: metrics (trends), logs (details), traces (request flow)
- Prometheus pulls metrics, stores time-series, queries with PromQL
- Design dashboards with USE (resources) and RED (services) methods
- Structured logging is searchable and worth the effort
- Tracing helps debug latency in distributed systems. Sample in production
- Alert on symptoms, not causes. Every alert needs a runbook
Next, you'll learn reliability engineering principles: SLOs, error budgets, and toil.