VictorOps is now Splunk On-Call! Learn More.
If the IT industry were a religion, ITIL would be its sacred text – or at least one of them. Like a sacred text, ITIL lays out the concepts and principles that IT teams should follow to achieve success. Also like a sacred text, ITIL is ambiguous in a lot of ways, and subject to interpretation.
That’s certainly true of incident response, which ITIL discusses in some depth, but doesn’t clarify entirely. If you’re involved in incident response, you would be wise to take a look at ITIL guidance, especially in light of the changes in ITIL 4, the recently released overhaul of ITIL. But, you should also take a healthy perspective toward ITIL, using it for general guidance, without expecting it to offer a complete checklist of everything you need for successful incident management.
ITIL (which is shorthand for IT Infrastructure Library) is essentially a set of recommended best practices to address all the responsibilities faced by IT operations teams.
ITIL was introduced in the late 1980s by a British government agency searching for better results in IT operations. The agency released a large series of books describing in great detail how IT teams should work in order to achieve greater reliability and efficiency.
Although ITIL has always been designed to apply to any type of IT work, rather than addressing only specific technologies, it has evolved to keep pace with changing trends and demands. The most recent version, ITIL 4, was released in February 2019.
In most respects, ITIL 4 expands upon most of the guidance included in its predecessor, ITIL 3 (which debuted in 2007 and saw its last major update in 2011). Thus, if you are familiar with ITIL 3, you already know much of what’s included in ITIL 4, too.
However, major changes in ITIL 4 include discarding the “service lifecycle,” a core ITIL 3 component laying out steps that organizations should follow when developing or launching a new system, and replacing it with guidance such as the “service value system.” Without going into too much detail, suffice it to say that the ITIL 4 guidance in this regard places more emphasis on continual improvement and engagement with end-users. In contrast, ITIL 3 took a rather static approach, wherein a project was designed and implemented with little up-front user engagement, and improved only after it was in production.
Another high-level change in ITIL 4 is stronger focus on flexibility. In certain ways, ITIL 4 is less about prescribing rigid processes for organizations to follow than it is presenting a series of options they can choose to embrace in order to follow best practices that ideally fit their needs. This is reflected in ITIL 4’s adoption of the term “practice” instead of “process” to describe the approach that IT teams should take to handling various IT responsibilities.
So, what does ITIL have to do with incident response? There are two ways of answering this question.
The first is to refer to the ITIL’s guidance on incident management. As we’ve explained before, this guidance boils down to five stages:
The meat of this guidance didn’t change in ITIL 4, although the framing of it did. As noted above, ITIL 3 treated incident management (and most other IT responsibilities) as a specific process to follow. ITIL 4 calls it a “practice” and suggests that incident management works best when organizations take a flexible, holistic approach.
To put this another way, ITIL 4 recommends treating incident management as not being about following a specific preset set of steps, or putting too much faith in tools. Instead, it focuses on building a flexible incident response strategy that pays as much attention to the people involved in incident response as it does to tooling. Importantly, those people include not just your own IT team but also partners who depend on you to identify, investigate and resolve incidents.
In some ways, this may seem confusing. If the point of ITIL is to tell you which best practices to follow, then why would ITIL 4 emphasize flexibility? Doesn’t its lack of specificity undercut the purpose of using ITIL in the first place?
The answer here is that the lack of rigid rules to follow for incident response in ITIL 4 is central to the value that ITIL 4 aims to provide. While ITIL still prescribes a core set of stages to follow in managing incidents, it doesn’t attempt to tell you which tools to use, or how to manage your team and your partners. It leaves those choices up to you, precisely because there is no one-size-fits-all best practice when it comes to these parts of incident management.
That brings us to our second point about how to think about the relationship between ITIL and incident response, which is that incident management teams would do well to treat ITIL as a great source of guidance, but not gospel truth.
After all, ITIL is not a set of laws, it’s not even a compliance framework. There is no specific requirement for following its precepts.
Many IT teams are likely to find that ITIL’s incident response stages are a good basic framework but they don’t do a great job of addressing challenges like alert fatigue, or the need to bring stakeholders from outside IT (like your PR or legal team) into the loop when dealing with certain types of incidents. Nor do they place as much emphasis as they might on using post-incident reviews to avoid recurring incidents of the same type.
Finally, they lack specific guidance on handling the unique challenges of incident response in modern environments built upon architectures like microservices, which add significant complexity to the incident response landscape, and often make it harder to identify the root cause of an incident.
Ultimately, then, your team will need to devise an incident response strategy that’s suited to your needs. This is true not only because ITIL 4 is not as specific as ITIL 3 in telling you exactly what to do but also because even if it were, ITIL offers only one perspective (albeit a well-designed and time-tested one) on how IT teams should operate. The best incident response strategy is one built on creativity and originality, not rote adherence to a script.
Start building a fully-customizable on-call management and incident response strategy that works for you with VictorOps. Sign up for a 14-day free trial or read about one of our customers, PSCU, to see how modern DevOps and IT teams are making on-call suck less.
Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, was published in 2017.