VictorOps Founders Know On-call Sagas

Tara Calihman - August 19, 2014

In case any of you need help with submitting your story for our current On-call Saga Giveaway, here’s a few to get you thinking. Maybe they will help to spark a memory…just in case you repressed the emotional trauma inherent in some of these scenarios.

The first story comes from our COO, Bryce Ambrazunas, who is almost as scary as he looks, and is not eligible for any of the giveaway prizes. Fortunately.


His story takes place on the night of Feb 13th, 2001. At the time, Bryce was the SVP of Operations at Raindance Communications (RNDC). The team spent the better part of three years creating one of the first on-demand audio and webconferencing platforms in the industry. At the time, Raindance delivered over three million minutes of international conferencing each day to all of the biggest names on the Fortune 500.

On that fateful night, something went horribly wrong inside the call routing portion of the conferencing system and the NOC lit up with calls complaining of inappropriate and dirty cross-talk in their conferences. Over the next 24 hrs, all hands were on deck troubleshooting the issue – from the hardware vendor, to the network carrier, to the internal development team. The issue was promptly resolved after being identified as cross-talk within the network switching platform but the damage was done and customers were irate. Coincidentally, call duration during the outage went up from 42 minutes to 63 minutes… apparently some interesting cross-talk during those conferences!

Need more inspiration? Here’s our CTO, Dan Jones, on his worst outage experience:


The horror story begins with a 3am alert for slow database response times during the early days at Lijit. After a few hours of troubleshooting with the DBA and sys admin teams, an issue was discovered with the iSCSI file system which ran the entire company infrastructure. The vendor was contacted and a problem with the iSCSI firmware was confirmed.  Major upgrades were required and large portions of the infrastructure had to be taken down for this to take place.  Through heroic, round-the-clock efforts of the IT and engineering teams, a tricky chess game took place to move around critical VM’s while upgrading iSCSI servers. The team worked non-stop for almost two days but any downtime was successfully averted.

Fine. We’ll give you one more example, straight from our CEO, Todd Vernon, about his awful on-call saga from 14 years ago…


At the time Raindance had one major customer, Wells Fargo, and the weekend of Super Bowl XXXIV was a weekend he’ll never forget. Late that Friday night in 2000, the small development and ops teams were working through a routine network upgrade when much-to-the-team’s-dismay, the platform wouldn’t come back up. Over the next 72 hours, Todd learned an important lesson: don’t activate the entire technical team at once. Some problems take longer to fix and after 24 hours of debugging, fallen soldiers need to be replaced. The corrupt boot file was finally located but nobody was able to watch the St. Louis Rams beat the Tennessee Titans. Raindance went public later that year (RNDC), and Todd remained Founder and CTO until it sold to West Corporation in 2006 for over $170M.

See? That wasn’t hard. Just send us an email detailing your story (we don’t need complete sentences! or grammar!) and you’re entered.

Does writing your story down in an email STILL sound too hard? Email us & schedule a time to tell us your story over the phone. Like they did in the old days.

Share your pain and let us give you a chance to make up for that awful experience with $500 towards an amazing adventure with Cloud 9 Living.