Samantha Haltman - May 05, 2015
We are big believers in using the tool we’re building. That being the case, we sat down with our Senior Software Engineer, Dan Hopkins, to conduct a Control Call case study and learn about how VictorOps has been leveraging the feature internally.
Sometimes when an issue arises, you need a level of synchronous communication that you can’t quite get with chat, texting etc. With notifications popping up in your browser, and alerts hitting your phone, it’s very easy to get distracted from the problem at hand. When you need to coordinate changes and multiple tasks need to be completed simultaneously, that’s when a control call really becomes a necessity to solving the problem.
At VictorOps, we’re constantly coordinating across multiple departments. Therefore, when it’s time to jump on a Control Call, we ensure we have representation from all key stakeholders. This ensures we can effectively communicate and collaborate cross-functionally in real-time.
A member of our exec team will take on the role of “incident commander”. Our COO or CTO will be in charge of starting the call, delegating tasks, and keeping everyone informed about what is going on. This gives those tasked with fixing the problem time to do so without worrying about keeping people in the loop simultaneously.
It’s definitely important to have the key highlights from the conversation documented within the timeline. Ideally, we’d love a cockpit recorder, but until then we need someone taking notes. When everyone convenes for the retrospective, these notes will help facilitate the conversation, and jog the memory of team members that were on the call. While the notes may not be dense enough for a post-mortem report, leveraging that information will lead to a more high fidelity recap of the situation.
When a call is in session, the conversation should only revolve around the issue at hand and not include any erroneous topics. It is important to limit the scope of the people on the call, and if possible only include those directly associated with the issue. Cue the Chicken and Pig Story.
Solid runbooks will definitely incorporate structured escalation plans. Everyone on the team should be able to identify when it’s time to initiate a Control Call based on the documentation we’ve put together. Luckily, with our alert annotation capabilities, when certain alerts fire, we can ensure that information automatically renders, letting the on-call engineer know that it is time to implement a control call.