As I travel around the country and meet folks who are interested in what VictorOps is all about, I’m often pressed with the question of “How are you different than some other alerting service?”. The responses are varied but more or less echo the sentiment that the initial alert is just step one in the incident management lifecycle.
The responsibility of being on-call is a much larger role than simply “ACK’ing” an alert pushed to your mobile device. Now that you’ve received the page, it’s time to get busy.
Would you prefer to have a ton of valuable context delivered to you automatically with the alert, or would you rather go look for that stuff on your own? If you’ve ever been woken up in the middle of the night and responsible for diagnosing and resolving a huge problem, I think you know the answer to that.
Having useful information and insight into what’s taking place during an outage baked right in to the alerts you receive can mean a world of difference during the early stages of an incident. What if all of the things you were going to look for to diagnose what’s happening (logs, graphs, notes, runbooks, etc) were provided immediately with the alert? When placed in-line with related conversations from your team and everything else that’s taking place regarding the problem that woke you up in the middle of the night, that’s a huge win for the on-call team.
When I first met Ryan Frantz at the Velocity conference in New York, to my great surprise this is exactly what he was advocating for and presenting to a room full of agreeing attendees.
Later that afternoon, I was able to catch up with him and pick his brain on additional thoughts regarding this subject. VictorOps was just about to release the Transmogrifier feature, a powerful rules engine that allows users to customize their alerts to not only provide useful content, but also modify fields, and create automated actions. The timing couldn’t have been more perfect.
Since then, Ryan and I have stayed in touch and I was ecstatic when I learned he’d be presenting the same talk he gave at Velocity for the upcoming DevOpsDays – Rockies event.
Etsy is at the forefront of the DevOps movement, advocating best practice concepts such as Ryan’s “Contextual Alerting” and John Allspaw’s “Blameless Postmortems” at nearly every large tech conference in the country. It only made sense to invite Ryan to take part in my DevOps Interview series.
Q: For those out there who are unfamiliar with who you are, can you give a quick intro?
A: I’m a Senior Operations Engineer at Etsy. I’ve currently got an itch to help solve advanced monitoring problems we all are facing, including improving signal detection, contextualizing alerts, and correlating events. I blog at http://ryanfrantz.com
Q: At what point did the subject of contextual alerting become a something of great interest to you? What drove you to giving talks on it and advocating for this type of stuff?
The first time I was on call and got paged! The information in the alert was minimal and I hadn’t been on the job very long so I was lacking valuable tribal knowledge and context that would have helped me address the problem. I’ve always tried to craft alerts to make some sense to the recipient. This really ramped up for me when I came to Etsy and was looking for a project to work on during our annual Hack Week. I was chatting with John Allspaw and he excitedly suggested it would be great if we could get graphs in our alerts. After cobbling together a proof-of-concept script to make that happen, I got into several conversations with like-minded folks that wanted to work on adding valuable, actionable context to alerts. Nagios-Herald is the result of that effort. Since then, I’ve been blathering about alert context to anyone that will listen!
Q: Are there any projects or events that you’re really excited about and want to share with us?
A: I have lots of ideas for several tools Etsy uses (and has open sourced) such as Nagios-Herald, Morgue (adding ways to surface information, and hopefully surprises), and NagDash (helping operators focus attention). I’ve implemented some of these ideas and am working on others.
Q: Where do you feel DevOps (the big picture) is now, compared to this time last year?
A: I think DevOps is gaining traction in older or more traditional organizations. I’ve seen it take root in single teams and portions of large companies. In each case, people are reaching out to others in their organizations to understand what the business’ needs are and to better communicate how their work contributes to that. There’s a strong recognition that the work isn’t about maintaining fiefdoms, but about providing value to the customers. Value can come from anywhere in the organization.
The U.S. Patent and Trademark office recently invited me to speak at a DevOps event they hosted for employees and vendors. What struck me was how honest and open they were about their collective progress. Several teams were able to specifically describe the improvements they’d made, and how that meant improvements in other parts of the organization, all of which meant their customers were better served. Even more, they were open about where they still needed to improve and how they planned to address those issues. To me, the essence of DevOps is honesty. Honesty about one’s abilities, honesty about progress, and being truthful to the ideas and goals each of our teammates works toward everyday.
Q: What are your thoughts on the biggest impacts or benefits teams can gain from adopting DevOps practices?
A: I remember early on in my career thinking, “If I just had a seat at the table, I sure could show those folks a thing or two!” I learned the hard way that lack of humility meant I wasn’t focusing on how I could best demonstrate that my work was relevant to the business and its goals. DevOps means one needs to have an honest conversation about what the work is and how it fits into the larger organization’s purpose. Every company exists to serve people; collaborating with other teams to meet the needs of those customers demonstrates that you already have a seat at the table. I think that DevOps has captured that idea and helped to spread it more than any other I’ve seen in my career.
Q: Any tools/services that you’ve heard a little about .. and want to learn more?
A: Docker! I’m generally curious to learn more, partly because containers are so hot at the moment, and partly because, as an engineer, I always want to to stay up on the latest trends. Unfortunately, time is always a factor and I find it easier to tuck into new technology when I have a compelling use case for it. At the moment, I don’t have a use case for Docker, but I’ve been curious about containerizing several monitoring tools for easy deployment as a single stack.
Q: What conferences or events are you looking forward to attending this year?
Q: How or where do you consume the latest ideas and topics regarding DevOps and/or Agile?
A: A copy of Devops Weekly rolled up under my arm, in my smoking jacket, in the study while two large wolfhounds warm themselves by the large fire. With a coffee.
In all seriousness, Devops Weekly is Gareth Rushgrove’s weekly compendium of ideas, presentations, and tools. He does an amazing job not only of sourcing new material, but keeping an eye on large and small trends.
Q: From all of the main DevOps topics, which one(s) take up most of your cycles?
A: Monitoring probably takes up most of the free cycles in my head. I think about issues and possible solutions all the time. I’m currently working to lay those problems down so that I can systematically research some ideas in the space and possibly develop some tools (and improvements to tools) that can help.
Q: Do you have any good books or resources that folks just starting to explore DevOps should check out?
Follow Ryan on Twitter and check out his blog where he recently published a post related to Alert Design… and if you live in the Denver/Boulder area, check out DevOpsDays – Rockies (April 23rd & 24th) to catch his talk and chat about all things DevOps.