Will La - February 25, 2016
It’s still February. And even though Valentine’s Day has come and gone, our love for all things Atlassian continues on. Catch up on part 1 of this series, where we discuss our JIRA integration, and enjoy part 2.
If you’re looking to reduce time to resolution & instill confidence in your team, read on because we’re talking about an easy way to integrate VictorOps with Atlassian’s Confluence.
(If you don’t know what Atlassian Confluence is, you can learn more here.)
Runbooks can help the on-call engineer to solve problems faster by providing answers to questions about specific incident resolution. (Learn more about the importance of runbooks.)
In Confluence, you can create, discuss, and organize documents (like runbooks!) for your teams and most importantly, those documents are CENTRALIZED.
Why is that a good thing?
– Centralization gives the team a single source of relevant information.
– Centralization enables documents to be detached from individuals and teams, which can increase accuracy.
– Centralization also allows your teams to know where to look when they are working on incident resolution, saving them brain cycles at 3am when a server goes down.
It’s a best practice to have a single referenceable document for your teams. This helps in giving your teams a common process that everyone can agree upon. If you have your teams on the same page (literally & metaphorically), it will greatly reduce confusion when an incident occurs.
This single source of truth also helps the on-call team member feel confident that this centralized runbook covers the right actions as opposed to digging up an outdated email, conversation or downloaded doc that was shared earlier.
When it comes to the number of unknown unknowns, we’ll never be 100% accurate, but what makes any document accurate is its ability to be precise, relevant and timely. Using a centralized runbook allows the team to push for accuracy through collaboration. When you have multiple sets of eyes on a single document, you get to capture inputs from multiple people, who can together decide whether the document’s contents are correct. Now you’re leveraging more minds and gathering feedback from the folks who are in the trenches, thus making your runbooks more relevant and hopefully, more helpful.
Since the team is meant to feel confident that the runbook is correct, it practically begs to be updated and maintained. With it being a centralized runbook, it will begin to push your teams to ask “Can you update that in Confluence?” every time there are minor updates needed on the document. This will help keep your runbooks up to date and ready for when the next incident occurs.
I’m a huge fan of life hacks and pro tips. A favorite trick is one that we use in our house quite often. When I find a misplaced object around the house (“A”) and I do not know where it belongs, I ask someone, “Hey! Where’s the first place you would look for A?”
They answer with “B” and I then put item A at place B. This saves us a lot of headaches when scrambling around looking for gloves, scissors, keys, spare keys, important mail, emergency items, etc. Over time, this leads to a wildly-convenient home where all things are placed where they logically belong or in odd places where they’re always found instantly. It’s been a lifesaver in many cases.
The goal is to be very specific and consistent when storing your runbooks. Also, you want to save a few brain cycles for your team members when they need to think about where to look. Confluence should be the first place they go. They should not be digging for incident resolution instructions in their emails, locally saved files or Dave’s inception of folders within folders on his shared drive somewhere. If you ask your team members where the correct runbooks are located, is the first answer always Confluence?
Let’s take this one step further and imagine that your house is magic. Whenever you are looking for something, your phone alerts you with a button that teleports you straight to the item’s specific location in your home. That would be amazing and I’m still trying to figure out how to pull that off in my home today.
With a rules-based engine that can enhance alerts (like our Transmogrifier for instance), you can quickly do this in your incident management process today.
Using the Transmogrifier, you can think up rules in an “If this, then that” mindset, where IF an alert comes in for HostX from Monitoring ServiceY, THEN attach the Confluence URL to the runbook for HostX specifically. This helps your team by providing a link to solve the problem, attached to the incident as it’s coming into the timeline.
Just imagine it: the right information in the right hands at the right time. The on-call engineers don’t even have to think about where to look and since the URL links to an updated runbook in Confluence, your team can trust that the info is correct.
That’s how you decrease time to resolution. Pretty nifty, eh?
Stay tuned for Part 3: HipChat!