Services must report their health:
Liveness checks: Is the process running? Kubernetes restarts if liveness fails.
Readiness checks: Can the service handle requests? Load balancer removes unready instances.
Deep health checks: Verify dependencies. Database connection? Cache reachable? Use sparingly to avoid cascading failures.
Implementation:
GET /health
{ "status": "healthy", "db": "ok", "cache": "ok" }
Heartbeats: Services periodically report to coordinator. Missing heartbeats trigger failover.