Interview question: "Design a monitoring system for M metrics per second."
Components:
- Collection: Agents on each host, push or pull
- Ingestion: Buffer and batch writes
- Storage: Time-series database (Prometheus, InfluxDB)
- Query: PromQL or SQL-like interface
- Alerting: Evaluate rules, deduplicate, route
Scale considerations:
- Cardinality limits (unique label combinations)
- Retention policies (hot/warm/cold storage)
- Query performance (pre-aggregation)
Design approach: Start with requirements. Estimate storage. Design for horizontal scaling. Plan for component failures.