Let’s face it. Being on-call is never a neutral experience. We asked five VictorOps employees to share their “on-call horror stories” or worst outage story of their tech career. Most of their responses were, “Which story do you want to hear?” My interviewees were haunted by memories of getting paged in the middle of the night, driving into the office, and spending hours, or even days, trying to fix a problem.
If you’ve worked in technology or engineering, you can probably relate to these stories and even share a few experiences yourself.
At the worst possible moment
While listening to each story, I noticed a pattern: the reason why most of these events were so bad and so memorable was because they happened at the most inconvenient times. For example, one person was snuggled in bed on Christmas night when he got the page; another was sitting with his newborn in the NICU, while another was out of town and unable to do anything to help fix a broken system that he had built himself.
This is the first in a series of on-call horror stories, and it wins the award for most embarrassing. Watch Bryce Ambraziunas describe his experience in this video, or skip the video and keep reading to learn his experience.
The situation: supporting the first reservationless conferencing system
Bryce was the VP of Operations at Raindance, a company that provided reservationless voice conferencing. By working diligently with a hardware provider and a network provider, the company crafted the first automated experience. Raindance was using cutting edge technology from MCI and there were many other businesses that were using this audio technology as well.
An embarrassing tech support call
Bryce got a call one night at about 2:00am from his support desk. The complaint was from one of Raindance’s biggest customers, a Fortune 100 company. They had heard some inappropriate language in their conference call that they couldn’t trace back to anyone on the call.
Bryce realized that the inappropriate language was actually extremely inappropriate. It turned out that the conference calls for their business customers were being crossed over with conferencing technology that was being used for 976 sex chat lines. Raindance’s corporate customers were hearing sexually explicit conversations, but the sex chat participants weren’t hearing the corporate customers on their calls.
Solving the problem while cringing
Bryce immediately dressed, drove to the office, and started troubleshooting with the people at MCI, their hardware provider. They had no idea what was going on. For three days he stayed in the office, for three days he tried to troubleshoot this problem, and worst of all, for three days, their customers heard phone sex in their conference calls.
Bryce said it was the worst position he’s ever been in throughout his professional career. The hardest part was getting a customer to stay on the line while crossed with the sex chat channel in order to troubleshoot the problem. That was the only way to figure out what was wrong.
Rebooting a server
At the end of the day, the problem came down to one server on the telephone network side that just needed to be rebooted. The server had gone haywire, was out of sync, and was putting people into the wrong calls. “It still makes me sweat when I talk about it,” Bryce told us, cringing.
Several of our storytellers had the same reaction after sharing their worst on-call experiences. It shows that working in the technology industry can be a 24/7 job, and how good on-call practices and teams that support each other can make or break your experience.