Will La - December 15, 2015
The Goal: To have the minimum viable runbook available at the fingertips of the IT or DevOps team member, upon the occurrence of the critical alert or incident.
There doesn’t need to be that much information, but enough to help the teams start looking in the right place and thinking about the right actions. Dealing with an alert at 4am is hard enough. Even if you do not have a URL link to a mature runbook, an annotated alert containing a few sentences with basic instructions beats having nothing at all.
Here’s how to get started…
Step 1: Choose a platform that allows you to send all alerts and chats into a single incident timeline. This makes it easier to capture all the information necessary to have a successful post-mortem and begin work on the MVR.
Step 2: Implement best practices. Ensure that teams are collaborating in the platform so that the actions/knowledge are captured. Be sure all chat conversations are taking place in the appropriate channel and should a conference call come into play, a scribe should be assigned to add contextual notes into the timeline.
Step 3: After an incident, conduct a post-mortem. This is a time to review the timeline of alerts, chats, etc and begin to decipher what worked and what didn’t - not a time to point fingers or assign blame.
Step 4: Use your findings. This is valuable information that can be used to build basic action plans with clear steps on how to resolve the issue, who to contact, what to refer to, where to find documentation, etc. Examples of useful documentation include links to graphs, runbooks or internal wikis.
Step 5: Append incoming alerts with the MVR. Make that action plan available to the team member, right when they need it most…when the same incident/alert happens again.
Step 6: Inspect and adapt. Review and improve. Measure and share. (Or insert any other two words that remind you that we are here to learn and grow…oooh - those are good, too.)
Step 7: Repeat. Proceed with steps 1-6 for other teams, services, tools, etc.
[Need help with appending runbooks to an alert? Our transmogrifier makes it easy!]
**For those more visually-inclined, here’s an infographic to help better illustrate the process.**