We started this blog series off by defining what a MVR is – next up, what to include in your Minimum Viable Runbook.
Develop your Incident Management Strategy – Lean Style
When using the minimum viable approach to building your runbooks, you want to leverage digital automation to capture your teams activity, collaboration, and actions.
By having a record of what actions and activities have worked, you can then leverage the recorded actions (that fixed the problem) to build your Minimum Viable Runbook. Simply review the conversations, actions or collaborations that were previously captured, and restate them into an “action-plan” as clear steps to resolution.
What does it look like?
Your runbook template should include, at the very least…
— The what: The first thing your team member should review in order to confirm the problem (which monitoring tools, status sites, charts). We want to save brain cycles and reduce confusion by focusing on the very first THOUGHT that should be made.
— The how: We now move right into the next step of building the very first ACTION that should be made to remediate the problem (i.e. server resets, QOS policy adjustments, etc.).
— The who (no, not the band): In the case of needing to include additional team members or different teams, your run book should include the RIGHT PEOPLE to contact if you need to escalate the incident.
— The where: These are the TOOLS and LOCATIONS of where to record notes, update statuses, post questions, record activities, etc.
Making the Runbooks Work
This is where the rubber meets the road. You now need to have these mini runbooks be front and center, right in your teams’ faces when the incident happens. This reduces or eliminates the time it takes for your team member to “dig” for your runbooks in a file repository or a wiki link. This also reduces the stress accompanied with solving a problem at 4am.
Solution: Implement a “rules engine” using an “if this, then that” approach to your incoming alerts.
Your runbooks then need to give call-to-action instructions, clickable links, viewable graphs (live) and static images of graphs for reference. This gives the incident team member instant context on not only how to fix the problem, but how to better understand what to look for while investigating.
Solution: Have your runbooks or alerting system display contextual actions based on payload information and then provide your teams with a URL that can be accessed alongside the alert using your rules engine. Additionally, you can add that specific URL to an existing reference document or a more detailed runbook. Note: These URLs must point to a single point of reference, thus making it easier for version control and consistent messaging.
Now that you know the why and what of MVRs, our next blog post will feature seven specific steps that will help you build a successful MVR.