VictorOps is now Splunk On-Call! Learn More.
Software engineers and IT professionals know the pains of being on-call during the holidays all too well. While many parents are woken up at the crack of dawn with kids jumping on their bed, on-call engineers also have to worry about those critical notifications. While the holidays are a great time for family and friends, IT professionals and DevOps engineers also know how stressful they can be.
Hence, the reason VictorOps was originally founded. We try our best to make on-call suck less for everyone involved – from customer support engineers to system administrators. But, service reliability and consistent uptime are the keys to success for a large number of modern businesses. So, there’s no real way to stay off the grid, turn a blind eye toward applications and infrastructure and ignore potential problems all throughout the holidays.
But, new tools and technology in DevOps and IT are giving IT professionals and software developers more autonomy, flexibility and context – leading to holiday on-call schedules that suck less. So, we wanted to go over a few tips and tricks for making the most of the holidays while you’re on-call – all without creating gaps in coverage or risking downtime. And, we’ll even show you some ways to use the excuse of being on-call as a way to get out of certain, un-fun holiday chores.
When holidays and on-call schedules coincide, it often leads to anxiety and stress for on-call engineers. On-call is stressful enough under normal circumstances, let alone when you’re trying to spend time with family and friends. So, we’re laying out a few simple solutions (and VictorOps features) that any on-call team can use to reduce alert fatigue and stress during the holiday season.
DevOps and IT operations teams that use static spreadsheets for on-call scheduling can’t easily make dynamic changes and ensure consistent coverage. A fully integrated system for on-call schedules, alert rules, escalation policies and real-time communication is the only holistic solution to incident management. Not only is this flexibility helpful during the holidays but it’s helpful when employees need to take sick days or paid time off. Whether you’re using VictorOps manual take on-call or scheduled override functionality or not, the on-call manager needs to find ways to create flexible on-call work environments for employees.
This works especially well for distributed teams where someone in New York City could trade on-call shifts with someone in San Francisco in order to cover each other’s shifts – allowing them both to eat uninterrupted meals with their families. A creative approach to on-call schedules throughout the holidays can help ensure more team members get adequate time to enjoy the company of their family and friends.
VictorOps offers time-based, customizable paging policies. So, for instance, you could set up a paging policy for the holidays that would allow you to only receive critical alerts via SMS during family time and other non-urgent notifications could go straight into your email inbox.
Or, you could take advantage of Snooze – a function designed to help on-call responders acknowledge non-critical incidents as they come in but delay incident response actions until a later time. The autonomy to customize the way you’re notified will help you prioritize alerts before they even get to you – leading to on-call holiday schedules that seem less invasive.
Of course, you can’t control when critical incidents pop up. But, a robust mobile app for real-time collaboration and incident response can surface applicable context faster and reduce MTTA/MTTR. In fact, you could get a push notification for an incident in the VictorOps mobile app, look into the incident details right then and there, determine that you’re not the correct person to fix this issue and reroute the alert to the right person.
If you don’t know exactly where you should route the alert, you can use one of our new machine learning features, Suggested Responders, to help you determine where you should send the alert. With an on-call incident management tool like VictorOps, everything from initial notification to critical monitoring information can be found in one place.
Alert automation can help mitigate alert fatigue and improve the efficiency of incident response in any context. But, incident automation is especially helpful during the holidays. Before an alert even notifies an on-call responder, VictorOps can route the alert through its rules engine to ensure the notification reaches the right person the first time. This means fewer false notifications while you’re sitting around the dinner table with your family.
As you improve alert automation over time, you begin to silence more and more unactionable alerts and create an incident response system focused only on the most important problems.
When a major incident inevitably strikes, with VictorOps, at least you have all the context and the tools at your disposal to collaborate quickly and find a resolution. In one centralized location, you can retrieve alert context and collaborate with other teammates.
You can easily spin up a conference call or open up a Slack channel and begin to communicate about the problem at hand. Then, all of this information – alert data and communication history – can flow seamlessly between VictorOps, Slack and even your ticketing tool, ServiceNow, etc. This streamlined integration and connectivity helps you maintain documentation and transparency without slowing down the real-time firefight.
Alert fatigue in DevOps and IT can lead to anxiety, sleep deprivation, negative physical health, cognitive impairment, job dissatisfaction and even longer incident response and remediation times. As much as anyone, DevOps engineers and IT professionals need to find time to rest and clear their minds. The holidays are a great time to do this. And, while it will take more time and effort to organize an on-call holiday schedule that works for everyone, it will benefit the entire team in the long run.
Compromise and strategic alerting will be an on-call team’s savior during the holidays. If an engineer simply can’t avoid being on-call during the holidays, just make sure you give him/her some additional time off either before or after in order to help them recuperate and see their loved ones. Over time, alerts can build up and an abundance of unactionable notifications will lead to alert fatigue. So, the best thing you can do for on-call employees over the holidays is to clean up non-critical and unnecessary alerts on a consistent basis and ensure the notifications on-call engineers receive are always important ones.
Now, let’s dive into some more fun ways that you can use being on-call as an excuse to help you get more out of your holiday season:
The oven is on, pans are roasting miscellaneous vegetables and the kitchen is in full chaos-mode. But, what if you’re one of the lucky ones who’ve been honored with the task of ensuring the crudité platter stays stocked while you watch the Thanksgiving football games? So what do you do when a family member runs out of the kitchen and says, “Hey – can you run to the store and grab some thyme? We just ran out.” But, your beloved Detroit Lions are about to score a touchdown in a close game. Well look at that – it looks like you just got paged. Being on-call is a great excuse for avoiding the cold weather or missing any important parts of a big football game.
Need to step away from the TV to enjoy some quality time with the family? Well, maybe you can use personal paging policies and your on-call alerting tool to help you stay updated on football scores. While this might be (is definitely) a little too much work for little reward – all we’re saying is that you could do it.
Everyone has been in a situation where they want to get out of a conversation. Have you ever been sitting around the table getting frustrated with a conversation you’re having with a family member? Well – feign an alert. “Oh no! I just got paged. I have to go deal with this alert real quick – I’ll be right back to talk more about this.” Now, you can go sit in the living room and take a second to cool off and collect your thoughts – or check your fantasy football matchup – whatever you want to do.
If you’re part of a particularly stressful on-call operation, try turning the notifications into a game of some kind. Every time you get paged with a critical notification, get another helping of stuffing before heading off to start working on it. Try to set a target for yourself as to how quickly and effectively you can resolve incidents over the holiday season. Ideally, you could get a number of coworkers to join in on this game. Then, you could all work together to drive MTTA/MTTR as low as possible and hopefully enjoy a few laughs throughout the holidays.
Whoops! Another alert came in just as we were taking dishes to the kitchen. “Unfortunately, I’ll just have to grab an extra slice of pumpkin pie and head into the living room to work on this incident.” Nobody likes cleaning dishes – but everybody loves pie. However, make sure you think strategically about your use of on-call excuses throughout the holidays because it’s just a matter of time before your family and friends figure you out.
All jokes aside, being present with your family and friends, and giving thanks for everything great in your life is an opportunity we’re not always afforded. So, we’re hoping these tips can brighten your day and help you make the most of the holidays, even if you’re on-call. For many people in DevOps and IT, being amenable to on-call responsibilities is a large part of what’s led to a happy holiday season for you and your family in the first place. But, as the DevOps mindset states – there’s always room for improvement. So, whether you’re already a VictorOps customer or simply a curious reader, we hope these techniques for managing on-call during the holidays were helpful.
From myself and the rest of the Splunk + VictorOps team, we want to wish you and your loved ones a very happy holiday season full of joy and relaxation. :)
See for yourself how VictorOps’ on-call scheduling, alert automation and real-time incident collaboration makes on-call suck less. Sign up for a 14-day free trial or request a free personalized demo today.