You now have a systematic approach to troubleshooting scenarios.
What to remember:
- Use OODA: Observe, Orient, Decide, Act. Think out loud
- "Site is down": Check recent changes first, verify the problem, check dependencies
- Performance: Identify the bottleneck (CPU, memory, disk, network) before optimizing
- Database: Check slow queries, indexes, connection pools
- Distributed: Use tracing, look for retry storms and cascading failures
- Kubernetes:
kubectl describeand events are your starting point
Next, you'll prepare for behavioral interviews.