Context through coupling: JIRA for on-call teams

Some of the most visible artifacts of organizational silos in engineering are tools. Visibility into dashboards, workflows, or documentation is cordoned off in separate and often redundant systems. Conway’s Law manifests in tool choices and integrations as much as in application development. As a team matures an Incident Management or DevOps practice, breaking these tool walls is necessary. In this post I’ll explore a low effort, high value way that you can extend integration between VictorOps and JIRA to break down those silos, and empower your…
Read More

Firefighting is a Team Sport

Organizing an on-call roster is tough. Aligning skills, experience, and availability with specific application technologies is difficult. In most cases you settle for “close enough” and hope smart people make good decisions. Skills and scheduling are really only the beginning; effective incident management requires focus on how your on-call team operates. I like to think of team dynamics on two dimensions. First, there is the structural organization of the team–people playing roles, workflows, and escalation paths. This is important to on-call teams because it impacts…
Read More

On-Call Ways and Means: A Developer’s Guide

Bringing non-traditional Ops folks, including developers, on-call can be a tricky process. Initial reactions tend to be highly polarized, either total pushback and refusal, or a meek acceptance coupled with fear of the unknown. For the former, understanding the root of the refusal is useful. For the latter, providing clarity and training is important. For those unfamiliar with Incident Management, there are some common misconceptions that fuel a fear of accepting on-call responsibilities. Chief among those are: – I’m going to be woken up for…
Read More

On-Call Handoffs: Empowering Adaptability in Incident Response

Managing on-call teams has always been a challenge in complex environments. With the continued adoption of Continuous Delivery, the challenges are squared. Now, not only do you have to manage a complex environment, the environment is changing dozens of times per day. On-call today has to be less about a strict execution of predefined procedures, and more about adaptability. Smart people, acting with good situational context, tend to make the best decisions. Those same smart people must be empowered with necessary skills and tools, but…
Read More

The State of On-Call Report: This is the Top Takeaway

Let’s get right to the point. On-call people fall into three equal categories: happy, neutral, and miserable. There are specific, consistent reasons why. By reading the State of On-Call 2016-2017 report, you will be armed with methods to reduce the misery and make on-call suck less. Introducing the State of On-Call 2016-2017 report, in which over 800 respondents shared insights about life on-call, infrastructure, culture, costs of downtime, incident management maturity, and DevOps practices. The Miserable Third Extremely unhappy on-call respondents suffer from powerlessness to solve problems,…
Read More

2016-2017 State of On-Call

Over 800 professionals shared a comprehensive picture of life on-call. Their on-call experiences range from highly evolved to completely dysfunctional. Where does your organization stack up?
Read More

The State of On-Call 2016-2017 — Kicking off Results Season

We collected the results, crunched the numbers, and are on the verge of launching the State of On-Call 2016-2017 Report. Big thanks to the 800+ people who participated. This Thursday, you’ll get a first look at the findings in a webinar we’re conducting with Alan Shimel and DevOps.com. Please join us. Todd Vernon, Joni Klippert, and I will discuss the survey results, including: • The factors that correlate with on-call satisfaction versus on-call misery • Structural and tooling trends • How DevOps practices impact the on-call…
Read More

Success Stories for Engineers On-Call

Real-time monitoring and alerting are critical to maintaining the performance and security of your infrastructure. But, with today’s astounding access to data, it is important to use the right technology to manage alerting in a way that’s customized to your environment. If not, alert fatigue will take hold and your teams will lose their ability to respond to incidents quickly and effectively.

Read More

5 Ways Testing Games Trained Me for Testing DevOps Applications

It might seem strange to go from testing games to testing DevOps applications. But I assure you: there are a lot of similarities. After spending the late 90’s as a systems administrator, I began testing video games at EA Sports Tiburon in Orlando, FL. After a brief time off to get a degree in theoretical mathematics, I’ve returned to the QA scene and am now a tester at VictorOps. Here are five ways that testing games trained me for testing DevOps applications. 1. In gaming…
Read More