Post-Mortems: Now with More Learning & Less Effort

There’s been a lot of discussion around post-mortems lately, both in the DevOps space and outside of it. Why do them? Should they be blameless? How often? Who runs them? SO MANY QUESTIONS. We’ve been asking some of those same questions for the past two years now. From our 2014 & 2015 State of On-call data, here’s what we currently know about the post-mortem, or retrospective, landscape.  — 50% have a defined post-mortem process, while 50% do not (same number as 2014).  — 66% only…
Read More

How Post-mortems Can Close the Loop on IT Metrics

In today’s modern infrastructure, it’s imperative to have situational knowledge about what’s going on…the good, the bad, everything. The DevOps movement has taught us that monitoring is a key component of adopting the best practices of highly efficient software delivery teams. These metrics are not only useful as context during incident management but also to analyze once an outage has been resolved.

Post-Mortem Fail

You may already be doing post-mortems but are you performing them in a way that takes blame out of the process? And why is that important anyway? In our guide, we give you the psychological reasons why blame doesn’t work as well as provide you with tips (and tricks) for doing post-mortems the better & blame-free way.
Read More

Blameless Post-Mortems

You may already be doing post-mortems but are you performing them in a way that takes blame out of the process? And why is that important anyway? In our guide, we give you the psychological reasons why blame doesn’t work as well as provide you with tips (and tricks) for doing post-mortems the better & blame-free way.
Read More

Blameless Post-mortems Webinar

Join DevOps Evangelist, Jason Hand, for a discussion on how to take full advantage of your post-mortems, and understand that the best post-mortems are often blameless.
Read More

Blameless Post-mortems Webinar

Everyone in the technology industry understands that incidents, outages, and failures are just part of it. When you are dealing with complex systems, it’s gonna happen. Following up an incident, outage, or even a successful deployment with a post-mortem isn’t a new concept. It’s easy to understand the benefits of sharing, analyzing, and understanding what went well and what didn’t. In many cases, individuals blame others, or worse, themselves for actions that may have led to an outage. Is this the most effective response or…
Read More

Why Blameless Post-Mortems are Essential

I mentioned in a previous blog post that one of the topics that came up in the Outages open space talk during DevOpsDays Silicon Valley, and something that I found myself hearing time and time again, was post-mortems, referring to a post-mortem report or a project post-mortem template with deliverables regarding outages. Outages are going to happen and most major tech companies have tools in place to help alert the right people, provide the relevant information to diagnose the problem quickly, and then collaborate with your team to resolve…
Read More

Post-Mortem Reporting

From fan sites to the many blog posts about the subject, post-mortems are all the rage these days. Especially if you’re looking to do DevOps right. It’s easy to see why post-mortems are useful as they provide an opportunity for reflection, learning and blameless discussion about what happened during an outage. Not only are they considered a requirement for those shops wanting to improve their internal process, it’s also become mandatory to have post-mortems so as to better communicate the event internally (to other parts of…
Read More

U mad bro? Disaster planning for on-call

Disaster. That word gets used a lot in our circles–it’s a trigger to the deepest FUD argument a vendor or colleague can make. A disaster can be defined in any number of ways: the number of customers impacted, revenue loss, or the number systems impacted. There are many metrics by which a disaster will be judged. For an on-call team however, the tale of a disaster is told in the minutes and the hours. Much like a security breach, the reality of a systems disaster…
Read More