You now understand fault tolerance.
Identify and eliminate single points of failure.
Add redundancy: active-passive or active-active.
Replicate across availability zones and regions.
Automatic failover with health checks.
Know your RPO and RTO for disaster recovery.
Monitor with metrics, logs, and traces. Alert on anomalies.
Next, you'll practice medium-difficulty system design problems.