World Class On-Call & Alerting - Free 14 Day Trial: Start Here.

How AIOps Can Reshape ITIL Incident Management

Dan Holloran July 12, 2018

DevOps Monitoring & Alerting SRE
AIOps and MLOps Reshapes ITIL Incident Management Blog Banner

The Information Technology Infrastructure Library (ITIL) model is slowly becoming a thing of the past. A one-size-fits-all approach to IT service management in a world of Agile delivery and interconnected cloud-based systems is not the solution.

With current ML (Machine Learning) and AI (Artificial Intelligence) technology at our fingertips, engineering and IT teams can start leveraging the power of AIOps. AIOps technology will allow IT and DevOps teams to establish customized incident management procedures, improve human efficiency and collaboration, and, ultimately, make on-call suck less.

The Traditional ITIL Framework and ITOps Management

The ITIL framework was built upon the idea that there could be a single source of information with all up-to-date information necessary for IT service management (ITSM). The CMDB, or Configuration Management Database, would hold all of the necessary data and assets required by the company for any ITIL operations. IT professionals could pull any configuration items or information necessary for various ITIL operations from one single place. In concept, this sounds great, but in application, ITIL can’t keep up with the rapidly changing IT landscape.

The CMDB-based ITIL framework comes with the benefit that most people in IT know how it works. But, due to the pace of continuous feature deployment and continuous system integration, lots of standard ITIL practices can’t keep up. Third-party applications and multiple-service infrastructures create a level of complexity which humans simply can’t manage alone. IT operations management requires the power of AI and machine learning to better process disparate data and remediate incidents more quickly.

The Future and Potential of AIOps and MLOps

AIOps and MLOps are closely related, yet slightly different. Machine learning algorithms can take data and feed it into AIOps processes and systems. So, AIOps can leverage automation and data to improve operational efficiency and make workflows easier for people. In the context of incident management, it’s important to note that much of AIOps and MLOps does not refer to replacing human involvement but to optimizing and prioritizing human involvement.

According to a recent report from Moogsoft, “40% of all large enterprises will use machine-learning-based systems by 2022 to complement and eventually replace their current IT monitoring systems.” In IT incident management, supervised and unsupervised machine learning techniques could be applied to recognize incident patterns, response techniques, redundant alerts, and much more.

ChatOps Incident Management

AIOps Applied to Incident Management

AI and machine learning can be used by engineering teams to automate incident response and remediation. Chat data, incident metrics, and system monitoring data can be fed into machine learning algorithms to create a futuristic organization of AIOps for IT and incident management.

For example, incidents could be matched to previous, similar incidents and applicable runbooks could be provided automatically. Alerts could be automatically escalated to someone who’s referenced an understanding of Spinnaker in chat. And, the system could tell an on-call engineer whether an alert is likely to resolve itself over time, or inform them they need to address the issue immediately. AIOps can analyze data and provide context more quickly than a human can.

People and AIOps Working Together

If you walk away from this post with one key learning, remember that AIOps for incident management should be designed to help your people. Traditional ITIL procedures simply can’t keep up with the speed of DevOps and automation. Artificial intelligence and machine learning technology can be leveraged via AIOps and MLOps to rapidly monitor your system’s inner workings and provide relevant, contextual alerts and communication to your teams.

Start managing your incidents in a centralized timeline. Sign up for a 14-day free trial to start receiving contextual alert data and collaborating in one place to improve visibility and improve time to incident remediation.

Let us help you make on-call suck less.

Get Started Now