Reducing MTTA: Response

Drew Abernethy March 20, 2019

Monitoring & Alerting On-Call
Reducing MTTA: Response Blog Banner Image

Part three of our Reducing MTTA series is focused on incident response. After optimizing alerts and notification policies, your team needs to be prepared for the incident response. In this post, we’ll discuss how you can improve real-time incident response with services provided by purpose-built on-call tools like VictorOps.

In case you missed any of the previous articles or feel the need to skip ahead, check out other articles in the Reducing MTTA series to improve each facet of on-call incident management.

Before diving into it, I’ll walk you through ways you can find the right information when you need it and improve real-time collaboration – leading to rapid incident response.

Empowering on-call users

Better alerts and notifications will make on-call responders more comfortable with alerts coming into the system. With annotations and runbooks attached to an alert, alongside informative payload data, the on-call user should feel knowledgeable enough to acknowledge the alert themselves. Whether or not the initial responder is the right person to fix the problem, the alert should always have enough context to empower on-call users to acknowledge the alert.

Increased autonomy in on-call operations will help the people behind your product remain composed when incidents occur. More people on your team will have the knowledge and confidence they need to acknowledge incidents and potentially remediate problems without escalation. Investing a lot of time and effort into the education of your on-call team will drastically reduce MTTA over time.

By helping people understand what they need to be aware of when on-call, users can quickly break down an alert and reduce acknowledgment and resolution time. And, as more users become comfortable with on-call responsibilities – and you embrace a culture of DevOps – both developers and operations teams get deeper exposure to systems in staging and production. With a better understanding of how your services and tools work together, on-call users will be more likely to acknowledge an incident and know how to handle it.

How to Make On-Call Suck Less

Driving collaboration

As more organizations embrace CI/CD and complex microservices and multi-cloud infrastructures become more common – incidents and outages will occur. Reducing downtime and lowering MTTA relies on educated teams and proactive incident response plans. Once this has been established, you’ll need to do whatever it takes to improve cross-functional collaboration.

In VictorOps, you can centralize all of your actionable alert data into a single timeline – improving visibility to incidents and allowing teams to collaborate with alert context. On-call collaboration services integrated with escalation automation and intelligent alert routing helps teams surface incidents faster and work together to find a resolution. Automation in on-call software leads to enhanced speed and alert context in incident response.

And, with a mobile-friendly solution, DevOps and IT teams can communicate in real-time and diagnose issues as soon as they happen. When incident response teams can work together in a robust, flexible application backed by automation and transparency, everyone wins. Get the most from your on-call incident management software by continuously improving the efficiency of interactions between people, processes and tools.

Reducing MTTA/MTTR with rapid incident response

In the previous two posts of this blog series, we’ve talked a lot about on-call autonomy, flexibility and automation. At this point, you should see how those values drive better collaboration and operational transparency – helping you reduce MTTA/MTTR and create a system for rapid incident response. And, as your on-call team gets more experience with incident response, they will drive deeper reliability of the overall system through updated runbooks, continuous process improvements and better collaboration practices.

With a well-built incident response plan and the right resources, on-call teams will be empowered with the information and context required to reduce MTTA and drive speedy incident response. Check out the other posts in our reducing MTTA series below:

Integrated on-call scheduling, intelligent alerting and visibility into monitoring metrics leads to a robust system for collaborative incident response. Sign up for a 14-day free trial of VictorOps or request a demo to start reducing MTTA/MTTR and make on-call suck less for your own team.

Ready to get started?

Let us help you make on-call suck less.