Avoiding On-Call Alert Fatigue and the Corresponding Effects

Avoiding On-Call Alert Fatigue and Corresponding Effects Banner

Alert fatigue is like when a child puts a finger an inch from your face and repeatedly says, “I’m not touching you.” It doesn’t annoy you at first but, over time, it wears you down.

On-call alert fatigue creeps up on you. Below, we’ll help you understand early warning signs of alert fatigue and why actionable alert processes and incident management solutions help you avoid it.

Sneaky Alert Fatigue

In our previous webinar with Threat Stack about this topic, “Ending Alert Fatigue with Modern Security & Incident Management,” we discussed how easy it can be to miss the signs of alert fatigue. Chris Gervais, VP of Engineering at Threat Stack goes deep into the subject and talks about the normalization of deviance.

Normalization of Deviance

The normalization of deviance refers to uncommon behaviors or procedures becoming the standard. As an example, Chris references the old adage of a frog gradually boiling in water. If the frog was simply placed in boiling water, it would jump out. But, when the frog is placed in room-temperature water that gradually heats up, it won’t notice the water is too hot before it’s too late.

This story reminds us that you can’t allow the gradual, incremental erosion of standard, actionable alerting processes. When unactionable alerts become normal, it’s harder for team members to identify actionable alerts and coordinate responses, resulting in slow incident remediation.

As your company grows and you continuously deliver on more complex, interconnected systems, the likelihood of chaos rises. In fact, in a previous article, we wrote about the importance of planned chaos engineering efforts for building resilient systems. You need to be cognizant that chaos and agility can create negative feedback loops which you’ll need to actively avoid. Desensitization to critical issues and alert noise can distract people from being able to quickly handle incidents.

Effects of Alert Fatigue and Burnout

According to our 2014 State of On-Call Report, 63% of IT pros said alert fatigue is an issue and 64% believe up to a quarter of all alerts are false alarms. These statistics are unacceptable.

Alert fatigue creates organizational and individual behavioral issues, leading to ineffective incident management. A constant stream of unactionable, non-contextual alerts can cause many issues for your team:

  1. Anxiety
  2. Sleep Deprivation
  3. Negative Physical Effects
  4. Cognitive Impairment
  5. Team and Individual Job Dissatisfaction
  6. Longer Response Time and Lack of Context

Clearly, you don’t want these problems. Waking employees at 3 am for alerts on which they can take no action just makes employees tired, cranky, and unproductive. Well-rested, unstressed team members think more quickly and resolve incidents in a timely manner.

Some Tips for Avoiding On-Call Alert Fatigue

Make All Alerts Actionable

Categorize and prioritize your alerts to make them actionable. Do your alerts provide proper context? Must they be acknowledged and worked on immediately? By providing your team with proper context and prioritization, they’re empowered to take action on alerts. Don’t alert on system successes—this just creates noise and fatigue. Instead, alert on critical system failures and anomalies that require immediate attention.

Reduce Redundant Alerts

If you find yourself receiving multiple alerts about the same thing, find a way to reduce these alerts. Help teammates reduce alert redundancies by creating after-the-fact, adjustable thresholds.

Isolate Alerts to Single Source or Timeline

Centralize your incident and alert information no matter where you decide to do it. Whether it’s through Slack, HipChat, or VictorOps, team members should have one single source of truth where they see how their entire system is behaving. Isolating alerts to one location provides greater system observability and deeper team collaboration.

Adjust Anomaly Detection Thresholds

Because you’re constantly iterating on your systems, you need monitoring solutions and incident detection methods that evolve with them. What you’re monitoring for today may be completely different tomorrow. Constantly re-evaluate your anomaly detection thresholds and monitoring solutions to ensure they are still applicable to your current system.

Ensure Correct Individuals/Teams are Alerted

Of course, sending alerts to the wrong people creates alert fatigue. Keep checking that alerts are being routed to the proper individuals or teams. (VictorOps can help with that.) Don’t get caught in a cycle of re-routing alerts to the proper people. Establish alert routing rules and organize teams properly to avoid alert fatigue and keep alerts pertinent to the people receiving them.

Customize Personal Notification Policies

A great way to limit alert noise and fatigue is by offering customizable personal notification policies. Allow your team members to escalate alerts and get notified of issues through channels that best suit them.

Provide Contextual Alerts

Rather than simply seeing an alert and knowing there’s an issue, detailed charts, runbooks, and logs in association with alerts can give your team the information they need to quickly understand an issue. Knowing which thresholds have been crossed and some system performance data, as soon as you’ve been alerted, will drastically improve incident remediation efforts.

Continuously Improve

Don’t get complacent with the way you’ve done alerting, monitoring, or incident management in the past. Continuously revisit your procedures and tools. Don’t forget to ask team members how they’re feeling. Do they feel alert fatigue? How can you help them avoid fatigue and make alerts more actionable? Everything in your organization should be open to re-evaluation and improvement.

VictorOps helps you avoid alert fatigue by providing effective alert context, routing, escalation, and notifications in one single location. Check out even more details by downloading our recorded webinar, “Ending Alert Fatigue with Modern Security & Incident Management.”

Let's Make On Call Suck Less...

Ready to get started?

Let us help you make on-call suck less.