VictorOps is now Splunk On-Call! Learn More.
At Iotum, we understand the importance of tools which strengthen cohesiveness among geographically distributed teams and remote workers. Our team builds tools of this nature, with a brand portfolio that includes innovative conference calling platforms, like Callbridge. We have offices in Los Angeles and Toronto; remote workers in London, Chicago, Connecticut; and wherever I happen to be at the moment. So, we frequently use our own conferencing software; it is instrumental to leading a team that involves distances of this nature.
Long gone are the days of traditional on-call rotation tools. Today’s generation of developers and Ops folks are self-organized and want control over their on-call shifts. They don’t want to carry a pager or have to remember to hand off a phone every week. Developers and Ops people would rather depend on an app that follows them on their device of choice. They want to know if they’re going to be on-call during special occasions and be able to trade shifts seamlessly without involving management.
Let’s take it back to 2013, when two rag-tag teams of engineers from previously independent companies were merged into a cross-border team. At this volatile time, tools were needed to facilitate monitoring, alerting, and on-call rotation management.
Luckily, much like our Callbridge virtual conferencing software sending out automated alerts for ongoing and upcoming meetings, as well as detailed transcriptions after the fact, we knew what we were looking for.
We were already really good at alerting. For everything. To everyone. That wasn’t the issue…
Later that year, VictorOps hosted a booth at the Southern California Linux Expo (SCALE), featuring wolves howling at the moon who promised to “Make On Call Suck Less.”
Convincing our devs on-call didn’t suck seemed impossible, but a tool that enabled smart incident routing, filtering, acknowledgement, and scheduling was a good start. (The incentive of half-day Fridays following a given rotation week didn’t hurt either.)
Within weeks of starting a free trial, our team embraced VictorOps as a vital component in our toolbox for maintaining maximum uptime of our conferencing services, and minimizing the time needed for resolution of reported incidents.
Many of our legacy systems have alert functions hardcoded in very impermanent ways. One of our first goals was taming known-to-be false alarms so that our team could sleep without interruption. One of VictorOps’ advanced tools, the Transmogrifier, allowed us to do that, and much more.
We redirected any alerts deemed ignorable to a “quiet queue”, which sent us emails instead of active notifications. We found that at least half of the triggered alerts weren’t urgent, they could wait until the morning.
The transmogrifier has allowed us to add annotations to alerts, offering tips and “checklists” for on-call engineers to resolve issues without needing to endlessly consult others.
Some of our monitoring solutions were generating less-than-desirable alerts, resulting in a personal hell where Up notifications don’t match the Down notifications. The transmogrifier has allowed us to tame them, transforming vague Up notifications into Resolution Alerts that automatically resolve incidents.
I spent a year traveling abroad with Remote Year, which presented a number of challenges and opportunities for managing our on-call rotations. Suddenly, our team’s “online” hours expanded from 8 hours to 16 hours on any given business day. We had an opportunity to take advantage of my “online” time while the rest of the team slept, with the challenge of minimal overlap and direct communication.
Because time zones make my head spin, I reached out to the VictorOps team and they helped me build custom rotation schedules to match our requirements. Their team welcomed our challenge and were incredibly helpful.
In the end, we were able to configure a set of rotation schedules that ensured nobody was ever woken up by an alert, and all of a sudden, on-call sucked EVEN LESS.
Considering the modern challenges of our business world, which include physical and communicative barriers, developing software that allows you to connect is a powerful asset. Even better is to find software that allows to connect, and get a good night’s sleep, no matter your shift. That’s why Callbridge and VictorOps work so well together.
Yes, our on-call strategy is still far from perfect. But these days, our team doesn’t mind being on call. And that’s good enough for me!
Sign up for your own 14-day free trial to see how VictorOps makes on call suck less for centralized and decentralized teams.
Michael is a program manager at Iotum, where he manages operations for several web-based video conferencing products, including Callbridge.com. A Remote Year citizen/alum, he is a strong advocate for remote workers and can often be found at the nearest taco stand.