Tara Calihman - October 24, 2014
There are lots of different ways to handle on-call scheduling for IT and DevOps teams. You can pull your hair out, curse the gods or flip multiple coins to find out who is responsible for site & app uptime this week. But it’s 2014 and it seems there should be a better way of doing things.
When it comes to best practices, what is the most effective way to schedule IT on-call rotations that doesn’t put too much stress on those that have to do the hard job?
For those of you dealing with IT teams that are spread around the globe, you already know the particular headaches involved with scheduling on-call rotations that cover multiple time zones. In the VictorOps on-call software, you can choose between a traditional scheduling setup, a follow-the-sun choice for global teams and then, of course, a completely customizable version of scheduling that allows you to create your own on-call schedule. There are lots of different inputs that need to be considered for an on-call rotation so be sure to pick a tool that is flexible enough to accommodate various setups.
If you work in an organization with lots of different teams, consider having someone from each of the disciplines (Development, Infrastructure, Database, etc.) on-call at the same time. You can take turns ACKing alerts based on who should be responding and everyone on-call knows they can contact their on-call partners if they need help out of a jam or are unsure about an alert outside of their expertise.
With customizable escalation policies and the ability to reroute alerts, it’s possible to send alerts to specific DevOps teams while also setting an aggressive escalation policy around who gets alert notifications if the on-call team member hasn’t responded in a given time period.
“_Escalation policies in VictorOps are really useful, and we leverage a pretty sophisticated escalation process. We escalate from one team to another and one person to another. It’s peace of mind to know that the reliability is there and know that if someone is driving in the mountains out of range, someone else will get the alert._” -- Nick Goodman, Bunchball
IT on-call best practice calls for a proper handoff day, even if you’re not actually passing a physical pager around. (Let’s hope you’re not still using a pager!) Doing this on a Wednesday means that everyone will be in the office, you shouldn’t have the distractions of the previous week interfering with an orderly process and most importantly: no one gets stuck with an extra weekend.
While you’re at it, why not have a little chat during the on-call handoff to ease the transition? You can discuss big events of the past week and whomever was on-call can include mentions of anything trending in the wrong direction. Early warnings and a chance to get everyone on the same page? Priceless.
Using smart on-call scheduling and alert software means that you always have the correct contact info for your team members right where you can find it. Knowing where to find help if you need it is one of the greatest ways to decrease stress, especially if there is a newer team member taking on-call for the first time. Additionally, having the ability to ACK, chat with others and reroute alerts from your smartphone means you don’t have to find a place to open your laptop, allowing you to solve the problem faster and increasing your work / life balance considerably.
Within the VictorOps on-call software, it’s easy to see any missed chats or plan for the future by knowing exactly when you’re on-call again. The Personal Pane summary view provides you a glimpse at your own on-call schedule, so that you can take the mystery out of your upcoming on-call rotations. Add to that the ability to create your own notification policies, which tells the software how you like to receive alerts, and it would seem that being on-call is a selfish pursuit. We all know that’s not true but it is important to remember the entire team and take them into consideration when it comes to scheduling on-call rotations. As a team, you should decide length of rotations but almost everything else can be customized through the flexible on-call scheduling process we’ve created.
What’s a person to do when you have something fun scheduled that falls into your on-call rotation? Cancel that fun - get back to work. Just kidding. If you use VictorOps, there’s a wonderfully simple button called ‘Take On-Call’ that allows you to still have fun while not losing your job. You send a chat message to someone on your team and ask them nicely if they’ll swap with you for a few hours. With our easy one-touch ‘Take On-Call’ button, a simple swipe means that everyone on the team is notified of the change, it’s documented in your timeline and all those changes are made on the backend.
These are just a few of the best practices that we recommend. Since we use our own tool 24 / 7 here at VictorOps to manage our on-call rotations, many of these suggestions come from lessons we learned the hard way in years past. Fortunately, we happen to know of an amazing service that can help your team to prevent on-call burnout.