VictorOps is now Splunk On-Call! Learn More.
Welcome to Reducing MTTA: Notification – the 2nd part of our blog series designed to help DevOps and IT teams improve incident management and make on-call suck less. Creating a process for actionable notifications and allowing users to customize the way they’re paged will improve employee morale and reduce MTTA (mean time to acknowledge).
To improve the other steps in the incident lifecycle, check out the rest of the series below:
The way that on-call teams are notified of an incident is actually more important than you may initially think. In part one of the reducing MTTA blog series, we talked about optimizing your alerts through automation to lower MTTA/MTTR. But, how the on-call responder is notified of an alert is the first human interaction with an incident. Creating more autonomy in the notification process allows users to feel empowered while on-call, surface helpful resources faster and improve the way incident details are consumed.
Being able to customize your notifications will ensure on-call responders are aware of issues faster. For instance, if you only check email a few times a day, you shouldn’t be getting notified of critical incidents via email. And, in VictorOps, you can change the way you’re notified based on time of the day and day of the week. So, you can change up the way you’re alerted on weekends vs. weekdays, night vs. day, etc.
With customizable paging policies, less often will an incident need to go through team escalation policies because the initial on-call responder didn’t acknowledge the incident in time. You can change the noise created by different alerts and notifications generated by the mobile app to help you know the priority of a notification.
While effective monitoring and alerting operations are the first steps in the overall incident lifecycle, notifications are the first step that humans encounter while on-call. A combination of optimized alerts and customizable paging policies help DevOps and IT teams surface the right issues at the right time. By detecting issues, surfacing machine data and notifying on-call engineers faster, you’ll reduce MTTA and make incident management suck less for your on-call teams.
Not only do flexible paging policies allow a user to know when and how they will be notified but it helps them choose the method for notification that is most helpful to them. For instance, receiving critical notifications through the VictorOps mobile app vs. SMS could potentially help you see more of the incident details and annotations faster – allowing you to respond to the problem or reroute it immediately if you’re not the best person to remediate the incident.
If the notification wakes you up at 3 AM but doesn’t require immediate attention, just snooze it through the mobile app. If it’s the weekend and you don’t need to be notified of non-critical alerts, you can set those to notify you by email while critical incidents will page you via SMS. Well-adjusted paging policies will allow you to quiet alert noise, improve visibility into critical issues and make on-call suck less.
Here at VictorOps, our on-call team believes autonomy and transparency are essential to rapid incident response that doesn’t suck. With primary paging policies and the ability to customize them based on time-of-day and day of the week, on-call teams have the power to determine how and when they’re notified. And, in VictorOps’s central timeline, anyone across the team can see all of the alerts, how on-call teammates were notified and any collaboration that took place – no matter how each user was notified or interacted with the incident (SMS, email, phone call or mobile app). Also, the timeline allows you to maintain a record for every incident in the system.
Now that you see how flexibility and customization in paging policies can make on-call suck less, check out the rest of the series to continue lowering MTTA and create value faster: