Focus on Detection:
Prometheus, and the case for time series analysis

Detection, in the Incident Lifecycle, is the observation of a metric, at certain intervals, and the comparison of that observation against an expected value. Monitoring systems then trigger notifications and alerts based on the observation of those metrics. For many teams, on-call is primarily about detection. Monitor everything and make sure we don’t miss out! In organizations with legacy monitoring configurations, getting better at Detection is tough. Environments are configured with broadly applied, arbitrarily set thresholds. Sometimes this is due to limitations in the monitoring…
Read More

Don’t Listen to Me, I’m a Fraud:
What You Need to Know About Impostor Syndrome

Denzel Washington. Sheryl Sandberg. Tina Fey. These are just a few well-known people who admit to bouts of Impostor Syndrome: when you view yourself as a fraud, despite evidence of your success and achievements. I feel this way sometimes. Maybe you do too. Impostor Syndrome often hits us in moments of vulnerability; when we are pitching an idea, turning our talent into a business, or in my case, giving a talk in front of hundreds of people (Come see my talk on Impostor Syndrome at…
Read More

U mad bro? Disaster planning for on-call

Disaster. That word gets used a lot in our circles–it’s a trigger to the deepest FUD argument a vendor or colleague can make. A disaster can be defined in any number of ways: the number of customers impacted, revenue loss, or the number systems impacted. There are many metrics by which a disaster will be judged. For an on-call team however, the tale of a disaster is told in the minutes and the hours. Much like a security breach, the reality of a systems disaster…
Read More

How to Continuously Improve Your Incident Management Practices

Continuous improvement is central to all DevOps efforts. At VictorOps, we’re seeing businesses of all sizes and resource constraints seek ways to help drive innovation in their organizations. When we empower our teams to continuously learn, their ability to adapt and grow becomes a differentiating factor, contributing to their organization’s success. Cultivating a Growth Mindset A cultural characteristic of any organization currently embarking on their own DevOps journey is one of a “growth mindset.” Without the ability to leverage feedback loops to learn, improve, and innovate, market leaders…
Read More

You Never Forget Your First Time at GDC

Although I worked in QA for Electronic Arts, I never had an opportunity to go to the Game Developers Conference (GDC), though I always wanted to. It’s a really big deal and I’ve heard stories for years from friends and past coworkers about how much fun it can be. And whether they met new people or heard outstanding talks, everybody remembers the first time they got to go. It’s the kind of event where students will save up all year to pay the admissions fee…
Read More

Talking to the Hand: Jason Hand, DevOps.com Award Winner, on How to Thrive as an Industry Evangelist

With 33 events, 27 presentations, 14 podcasts, 11 tech articles, countless trips, and one ChatOps book written and published, Jason Hand had a busy 2016. As the newly-minted winner of the DevOps.com Top DevOps Evangelist award, Jason spoke with me about what it’s like to be an evangelist, what he has learned, and who is motivated to do this work. JK: So, my sense of an evangelist is that you are an ambassador for an industry or a concept, and you’re out there speaking and…
Read More

The State of On-Call Report: This is the Top Takeaway

Let’s get right to the point. On-call people fall into three equal categories: happy, neutral, and miserable. There are specific, consistent reasons why. By reading the State of On-Call 2016-2017 report, you will be armed with methods to reduce the misery and make on-call suck less. Introducing the State of On-Call 2016-2017 report, in which over 800 respondents shared insights about life on-call, infrastructure, culture, costs of downtime, incident management maturity, and DevOps practices. The Miserable Third Extremely unhappy on-call respondents suffer from powerlessness to solve problems,…
Read More

The State of On-Call 2016-2017 — Kicking off Results Season

We collected the results, crunched the numbers, and are on the verge of launching the State of On-Call 2016-2017 Report. Big thanks to the 800+ people who participated. This Thursday, you’ll get a first look at the findings in a webinar we’re conducting with Alan Shimel and DevOps.com. Please join us. Todd Vernon, Joni Klippert, and I will discuss the survey results, including: • The factors that correlate with on-call satisfaction versus on-call misery • Structural and tooling trends • How DevOps practices impact the on-call…
Read More

Automating Developer Environment Setup on OSX Using Ansible, Homebrew, and Docker

We software developers generally love automation. We like our Puppets, Chefs, Ansibles, etc. It is considered bad form to have bespoke, provisioned machines these days–for lots of very good reasons. So here at VictorOps, we definitely use automation. A lot. Puppet provisions our infrastructure and Jenkins handles our automated tests and our build and deploy pipelines. And we have a hubot or two poking around in our Slack channels and in our VictorOps service. All of that is more or less standard fare these days…
Read More

5 Ways Testing Games Trained Me for Testing DevOps Applications

It might seem strange to go from testing games to testing DevOps applications. But I assure you: there are a lot of similarities. After spending the late 90’s as a systems administrator, I began testing video games at EA Sports Tiburon in Orlando, FL. After a brief time off to get a degree in theoretical mathematics, I’ve returned to the QA scene and am now a tester at VictorOps. Here are five ways that testing games trained me for testing DevOps applications. 1. In gaming…
Read More