A good runbook is written for a particular service and answers several questions: What is this service, and what does it do? Who is responsible for it? What dependencies does it have? What does the infrastructure for it look like? What metrics and logs does it emit, and what do they mean? What alerts are set up for it, and why? For every alert, include a link to your runbook for that service. When someone responds to the alert, they will open the runbook and understand what’s going on, what the alert means, and potential remediation steps.
Link · 693