Showing Post-Incident Review Posts

Todd Vernon February 07, 2018

The-End-Of-Root-Cause-Analysis-Blog-Banner

Systems fail, servers go down, shit breaks. With the pressure of continuous uptime and world-class customer experience, we’re being asked to build, deploy, and operate our systems with increasing speed. Accordingly, our approach to incident management leaves little time for trial and error—we need to detect, respond, and remediate problems...

Read More »

Matthew Boeckman October 20, 2017

blog-2017-10-20-Cynefin framework

The Analysis phase of the Incident Lifecycle occurs after an incident is resolved, and focuses on learning. This phase is often referred to as a postmortem, but increasingly teams prefer the term “Post Incident Review” (PIR), or “Post Incident Analysis” to describe the event and activities surrounding unpacking of the...

Read More »

Jason Hand September 08, 2017

Over the course of the last three blog posts on my eBook, Post-Incident Reviews, Learning from Failure for Improved Incident Response, I’ve shared recommended methods for incident management and conveyed how the old ways of conducting post-incident Root Cause Analysis are outdated and ineffective. It’s critical for companies to understand...

Jason Hand September 01, 2017

As we dive further into the eBook, Post-Incident Reviews, Learning from Failure for Improved Incident Response, we’ll explore how using analysis as an avenue for learning is the key to developing a successful post-incident plan. If you haven’t read the previous posts in this series, you can find them here:...

Read More »

Jason Hand August 25, 2017

When incidents occur, the natural response is to investigate and pinpoint the cause before looking for a solution. However, this traditional approach assumes that causality is determinable. The modern IT professional needs to understand problems stem not from one primary cause, but from the complex interplay of our systems and...

Read More »

Jason Hand August 18, 2017

As technology advances at a breakneck pace, expectations and challenges simultaneously increase. Clients expect flawless service, 24/7 support, and quick, easy-to-implement solutions. As an IT professional and DevOps evangelist, I’ve come to understand that to manage these expectations, new and updated methods for detecting, resolving, and improving systems need to...

Read More »

Marlo Vernon August 14, 2017

Jason Hand is DevOps Evangelist at VictorOps and author of Post-Incident Reviews, a new ebook about learning from failure to improve incident response. Jason spoke with Marlo Vernon about the writing experience. This interview has been edited and condensed. MV: What prompted you to write this book? JH: Since I’ve...

Read More »

Davis Godbout August 11, 2017

I’m excited to announce that VictorOps has updated the feature formerly known as the Postmortem Report, which is now called the Post-Incident Review. This updated feature aligns with our colleague Jason Hand’s new O’Reilly book: Post Incident Reviews. Using VictorOps to continuously improve When we first launched the Postmortem Report over three...

Read More »

Tara Calihman March 25, 2016

Post-mortems

There’s been a lot of discussion around post-mortems lately, both in the DevOps space and outside of it. Why do them? Should they be blameless? How often? Who runs them? SO MANY QUESTIONS. We’ve been asking some of those same questions for the past two years now. From our 2014...

Read More »

Jason Hand July 18, 2014

526861820_72a14e3880_z

I mentioned in a previous blog post that one of the topics that came up in the Outages open space talk during DevOpsDays Silicon Valley, and something that I found myself hearing time and time again, was post-mortems, referring to a post-mortem report or a project post-mortem template with deliverables regarding outages. Outages are...

Read More »