Toil is manual, repetitive, automatable work that scales with service size.
Characteristics of toil:
- Manual (human does it)
- Repetitive (done more than once)
- Automatable (could be scripted)
- Tactical (no lasting value)
- Scales with service growth
Examples:
- Manually restarting services
- Responding to the same alert repeatedly
- Manual deployments
- Copying data between environments
Google's target: SREs should spend less than % of time on toil.
Interview question: "Give an example of toil you've eliminated."
Describe a repetitive task, how you automated it, and the time saved.