Jason Hand - September 19, 2014
The VictorOps team just wrapped up another great trip to Velocity New York. While the East Coast version of the conference is quite smaller than it’s Santa Clara counterpart, the event overall was a huge success all the way around. We had so many great conversations and learned a great deal about what others in the web and operations space are doing, as well as how we can better serve our “on-call siblings”.
Perhaps I was just naturally drawn to a specific theme of presentations, but to me a big part of the overall tone of the conference seemed to lean towards monitoring and distinguishing between the good and bad alerts. This is a topic near and dear to us here at VictorOps. We are continuously assisting on-call teams with methods and best practices for monitoring, filtering, and understanding only the most important alerts generated by complex infrastructure.
Additionally, the subject of “Blameless Postmortems” is something I’ve grown a passion for. With a webinar on the horizon regarding this subject, of course I had to check out John Allspaw’s (@allspaw) 2-part talk on Theory and Practice of ‘New View’ Debriefings. Allspaw of Etsy is, in my opinion, THE leader in the space of postmortems with regard to complex systems and the human condition.
_"The point of a postmortem is to LEARN."_
A wealth of information was provided over the course of his 3 hour presentation that I can’t wait to dissect and inject in to my content and consultations with others looking to better understand, conduct, and engage in truly “blameless” postmortems.
Later that evening the IGNITE talks kicked off with 15 presenters blasting through a variety of topics, including VictorOps’ own Tara Calihman (@tarable) and her experience, expertise, and humor on long-distance backpacking.
When the US government’s biggest tech project runs in to problems, who do you call? Well, the Site Reliability Manager for Google is a pretty good place to start. Mikey shared an amazing and humorous story on the challenges surrounding the HealthCare.gov launch and how he helped bring basic DevOps concepts to the table in order to turn the massive project around from the brink of utter failure to a success beyond expectation.
Ryan spoke to a packed room with two very clear requests for takeaways:
As I stated earlier, a repeating theme of many presentations I was able to catch was focused on making sense of alerts and reducing noise from your infrastructure.
_"We want alerts to help us focus our attention."_
Frantz focused on this subject throughout his 40-minute talk, highlighting everything from alert fatigue (as a result of high volume, low value alerts, heightened anxiety, distraction, sleep loss, and lack of context) to John Boyd’s OODA Loop and how it can shape the way we observe, the way we decide, the way we act.
I was able to speak with Ryan later that afternoon where we had a great conversation on the specifics of what makes for “good” context, all of which falls perfectly in line with what you can expect to see in upcoming feature releases from VictorOps.
Tuesday night, we hosted an
exclusive all-inclusive event for those that came by our booth to chat about the real-life struggles of on-call teams. The number of attendees and conversations were beyond our expectations.
We are humbled and excited to be a valuable tool and partner to so many out there looking for a better way to manage their incidents and on-call teams. Thank you to everyone that came out and congrats to Vishak Ramaiah (@vishakseshadri) of Enova for winning our raffle drawing for the custom VictorOps bluetooth headphones.
Wednesday morning’s keynote included an interesting argument on monitoring without alerts. While we agree with much of what Reitbauer stated in his presentation, the title might have been a bit misleading. In our opinion, alerts are a necessary part of monitoring.
Perhaps the more important take-away from the talk was to do away with alerts that are NOT necessary. By adjusting thresholds and removing notifications that provide little-to-no value, your monitoring will provide much more value to you and your team, but doing away with alerts entirely isn’t something we suggest.
While I sat and absorbed the morning’s keynotes, I had the pleasure of sitting next to and enjoying the creative process of Natalia Talkowska (@NatiTal) as she created sketch notes in real-time during each presenter’s time on stage.
The final keynote was, of course, provided by Tim O’Reilly himself. Touching on a number of key points and presentations throughout the Velocity conference, Tim’s primary message was one of empathy. Reminding us that people should be at the heart of everything we build and that getting the “people” part right, is much of what us in attendance need to understand and embrace.
The rest of my Velocity experience was filled by catching Dave Josephsen (@DaveJosephsen) hit another home run on his presentation regarding best practices for alerting as well as the latest iteration on DevOps in a Post-DevOps World by J. Paul Reed (@soberbuildeng).
As I shared beers, burritos, stories, and laughs with Jason Dixon (@obfuscurity), Mike Julian (@mike_julian), James Meickle (@jmeickle), Dave Josephsen, Lyndsay Holmwood (@auxesis), and Jesse Reynolds (@jessereynolds) later, I realized what an amazing couple of days it has been here in Manhattan at Velocity. Seeing so many brilliant individuals and teams come together in one place and being a part of that is such a great experience.
At VictorOps, we love helping others make the responsibility of being on-call suck less and getting feedback on how we are doing exactly that is very rewarding. I’m looking forward to getting back to our Boulder base camp to share and implement everything I’ve somehow squeezed in to my head. But first - I’m off to San Francisco for PuppetConf. See you there!