Is it possible to gamify on-call? We didn’t think so until we spoke with Nick Goodman, Director of Platform Engineering at Bunchball. Nick used VictorOps to put developers on-call and got them racing to be the first to ack-back. We talked with Nick, and he told us how it works.
“Our front-end devs are really excited when an alert goes off.”
Production system issues used to be a black box to most people on our team. Only a few developers had any insight whatsoever. Now that we use VictorOps, the entire team is on-call, but in a very flexible way. Our front-end devs are really excited when an alert goes off because they want to be the first one to ack it.
With VictorOps, our teams provide a high level of service to the organization and still get to have their lives. The team is a lot happier now. And it is making our devs better devs because they have a holistic view of our system.
No more alerting fatigue
Our team did a good job before we signed up with VictorOps. We didn’t miss any outages, but it was a real pain.
Before VictorOps, we used Nagios as our base on-call system. It sent a text message to one person and an email to everyone else. If the primary on-call person didn’t respond after a few minutes, it escalated to everyone. The whole team got alerted a lot. The system was pretty inflexible. Only an operations person could make configuration changes, so it only made sense to change on-call on an infrequent basis, like when someone went on vacation.
One of the biggest things that’s changed since life with VictorOps is it feels like we can take a break and not be on-call. Escalation policies in VictorOps are really useful, and we leverage a pretty sophisticated escalation process. We escalate from one team to another and one person to another. It’s peace of mind to know that the reliability is there and know that if someone is driving in the mountains out of range, someone else will get the alert.
VictorOps mobile functionality is great too. Push notifications are much more reliable than SMS, and VictorOps’ mobile app itself is definitely superior.
Improving the knowledge base across teams – putting devs on-call.
I’ve built on-call systems myself before, and I’m a firm believer in developers being on-call. A lot of companies aren’t big enough to have dedicated on-call teams of 10 or 20 people, and devs are smart and very capable of doing the work.
Bunchball is an agile development shop, so all teams are self-managed. The trick was convincing everyone that putting devs on-call was the right idea. VictorOps helped make that happen because it’s so flexible.
We are a very transparent and open team, and we encourage people to trade on-call duty. The ‘take on-call button’ is key, and huge improvements from our previous process. One of our devs is a season ticket holder of the New York Giants. It’s easy for one of us to take on-call from him during the game and then swap back once he’s available again.
I really like the self-administration of notifications in VictorOps. Each person on our team can customize alerts to their own situation and what works best for them. That makes our teams much happier. Now that we’re on VictorOps, our devs have an understanding of how the system functions on a day-to-day basis. It feels like everyone is more engaged in the ongoing success of our product.
VictorOps is the best that’s on the market for what a tech shop like ours needs for on-call.