Reliable systems survive failures. Design assumes components will fail.
Redundancy patterns:
- Active-active: Multiple instances serve traffic simultaneously
- Active-passive: Standby takes over on primary failure
- N+1/N+2: Extra capacity for failures
Failover considerations:
- Detection time: How quickly do you know something failed?
- Failover time: How long to switch to backup?
- Data consistency: What happens to in-flight requests?
Interview question: "Design a system with % availability."
Redundancy at every layer. No single points of failure. Automated failover. Multi-AZ deployment.