World Class On-Call & Alerting - Free 14 Day Trial: Start Here.
Growing up, artificial intelligence and machine learning seemed like something that could only exist in fantastical science fiction movies. And, while the type of AI and ML that exists in RoboCop might still seem a little out of reach (and unnecessary), we’re slowly realizing real world benefits and use cases for this technology. Traditional IT Operations Management (ITOM) and software developers are using AI and machine learning today, in an effort to improve customer experiences around software.
This has led to the coining of the term, AIOps. AIOps is the application of artificial intelligence and machine learning for software development and IT operations processes. In the future, AIOps is opening the possibility of negative mean time to acknowledge and resolve (MTTA/MTTR) for incident response. Before you even get notified of an outage or error, your service could possibly use AI and automation to detect the problem, identify the root cause and execute a playbook – fixing the issue before it escalates into a customer’s hands.
So, let’s look at the new era of ITOM and how today’s AI, machine learning and automation capabilities are feeding into a futuristic vision of AIOps and ITOM.
IT Operations Management, ITOM for short, is the management of configurations, components and requirements for all of a team’s IT infrastructure, applications, networks, databases, etc. ITOM is a comprehensive term for IT practitioners and engineers in charge of building and maintaining resilient systems through intelligent management of their technology. Most types of IT management traditionally fell under ITOM – from configuration management and release management all the way to incident management and remediation.
The rise of DevOps, cloud infrastructure, CI/CD, microservices, containers and more have led to a bit of a breakdown in the traditional ITOM structure. In ITOM, IT operations was siloed from the developers who wrote code, therefore breaking individual responsibilities down in a pretty black and white way. But, those lines are becoming a bit blurred now. Developers are taking more responsibility for certain aspects of ITOM (e.g. on-call, incident management) while IT practitioners are involving themselves more in the development lifecycle (e.g. QA, automated testing).
So, teams are looking to automation and AI to help streamline these development, testing, release, deployment and incident lifecycles – leading to AIOps.
AIOps is pretty self-explanatory, it’s the application of artificial intelligence to operations functions. But, how AIOps is applied and the future potential of it in ITOM is the most interesting part. Teams are finding a balance between the level of artificial intelligence available today while thinking about what’s next. Can you use AI to read through real-time incident response chat history and correlate it with system data to better understand human operations behind service outages?
AIOps is a true culmination of what’s possible when you bring data to every question, decision and action with the Data-to-Everything Platform, Splunk. Today, teams can already combine the power of Splunk IT Service Intelligence with VictorOps to correlate services and incident history, leading to less reactive incident management and response teams. And, the best part is the way this AIOps solution connects to human on-call workflows in VictorOps. Via AI, humans can be more connected to their tools and systems, helping them take action faster, build out better automation and collaborate more efficiently.
AIOps is connecting developers to production systems and giving IT operations teams insights into development pipelines. Greater exposure across all aspects of the software delivery and deployment process will lead to more transparency, better communication and more reliable applications and services. In order to facilitate AIOps from a people operations standpoint, teams are adopting a more progressive DevOps model focused on automation, collaboration and transparency.
Tighter relationships between developers and IT engineers will allow teams to find more areas where AIOps can be applied. How can AI be applied to the software testing lifecycle? A lot of the work in AIOps continues to be exploratory in nature, feeding into the DevOps mindset of continuous improvement. As machine learning models, both supervised and unsupervised, become more complex and accurate, teams can create AI models to improve our understanding of how production environments function. Someday, as you get into even more complex, automated microservices architectures, you could potentially use AI to help you draw service maps and track application health.
Today’s applications of AIOps in DevOps and ITOM aren’t quite as proactive as we’d like them to be. But, AIOps is already showing tangible benefits to software developers and IT practitioners who are interested in deepening observability and controllability for their applications and infrastructure. AIOps and machine learning built into VictorOps is already helping teams improve the way they respond to incidents by suggesting additional responders and suggesting similar incidents.
Add the robust service mapping, event correlation and visibility into application health offered by tools like Splunk ITSI, you have a flywheel for proactive incident response. ITOM is as important as it’s ever been but the evolution of CI/CD and DevOps for people, alongside machine learning and AI for systems, is forever changing what it once was.