There is more to being on-call than just knowing how to type in the latest ChatOps commands, reboot AMIs and print out java stack traces. There are life skills that come from being on-call for a while and fortunately, those are lessons that can be taught.
Here at VictorOps we’re currently adding six new engineers to our on-call roster, so I’ve been thinking about the experience of being on-call and how to make the best of it.
The first day you go on-call can be frightening. The most important thing to remember is that you’ve already passed the first test. You have the trust and respect of your teammates and are providing them with a valuable commodity: peace of mind. No one wants to be on-call, so stepping up to the plate and taking shifts helps to improve the lives of everyone on your team.
1.) Make sure you understand and have the tools you need to do your job. If you don’t know how to use them while you’re at work, there is no way you’ll remember at 2am. Here’s a list, obviously your particular job might vary…
* SSH credentials
* sudo privileges
* RSA key fob
* Credentials to your support portal
* Phone numbers and escalation policies for components of the system that you’re responsible for
* Links to the runbooks or chatops commands
2.) Understand the expectations for being on-call, both implicit and explicit. Hopefully your company has taken time to document the expectation for how you’re supposed to behave when you’re on-call. It’s always best to have things explicit, but looking through your chat rooms or timeline might give you indication if there are implicit rules that different team members follow. Some examples of both implicit and explicit rules are:
* “How fast should you be responding to pages?”
* “When should you escalate incidents to more senior team members, other teams or customer support?”
* “How should you handle short periods of time where you need to be away from your computer, such as going out to dinner or a movie?”
3.) Remember to communicate. This is often a tricky one for people in our field but communicating between teams (both engineering and non-engineering) is one of the key skills to being an on-call engineer. In addition to being expected to fix or diagnose issues, you’re there to send out communications with the rest of your team(s). There is definitely finesse in understanding when an issue needs to be run up the flagpole so take care to learn from how others on your team communicate.
4.) Manage your life. If you’re not a full time on-call engineer, you’re going to spend a lot of time balancing your “real duties” with being on-call and most importantly, with having a life. This is a tricky balance to get good at. If you’re on-call for extended periods (longer than a few days) you’re going to notice a precipitous drop off in “vigilance.” There are behaviors and a level of focus that you can only sustain for so long while being on-call.
5.) What about sleeping? When you’re on-call on a night shift, and you’ll be sleeping during it, there is a quick “pre-sleep” checklist that you should learn:
* Your “pager” should be set to “make lots of noise”
* Check your timeline for any warnings that will become incidents overnight (better to catch it early)
* You might save yourself a headache by having your computer at hand (close to your bed) so you don’t have to run through the house in your skivvys
6.) You’re not actually on house arrest. If you still want to have a life while on-call you might, on occasion, leave the house. Consider doing a few of the following:
* take your laptop and a phone that can tether
* let your teammates know
* trade on-call for a couple hours
Hopefully your first night on-call won’t be the shitstorm you fear and you’ll move on to be an integral part of the on-call team. If you’re looking for other helpful tips, check out our On-Call Firefight Survival Guide. Here’s to making on-call suck less!