The pace of CI/CD has increased significantly since Agile and DevOps became mainstream. Today, development teams thrive on collaboration and conversation tools that allow them to work together and produce better software.
Despite the advancements and evolution of collaboration practices and tools, teams face conventional communication-related challenges such as those arising from differences in culture, language, and timezone.
It’s not possible to overstate the need for seamless, highly available, and effective two-way communication in the incident management lifecycle. Organizations often struggle to find and adopt a common collaboration platform that keeps everyone on the same page during incident detection, response, and remediation. Though many tools used by DevOps teams are automated, they still require human intervention.
Let us take an example of a service failure. The tool reports an incident to the operations team. Then, to follow up on this alert, the team has to:
In order to automate the above-mentioned workflow, the monitoring tool should automatically sense the irregularities (e.g. error rates surpassing the threshold or a critical failure), send an alert to the DevOps team, generate a ticket in a tool such as Jira or ServiceNow with relevant information, and escalate it to the right person in the engineering team.
As an incident responder works through an issue, ChatOps can keep everyone informed and automatically update tickets as the team moves through incident workflows. However, this is easier said than done; studies indicate that shifting from one application to another results in context switching, reducing the efficiency and productivity of the teams.
ChatOps is gaining popularity as a means to make incident management more agile and less taxing for the teams involved. In fact, in a recent survey we conducted, incident response emerged as a primary use case for ChatOps, taking precedence over ticket tracking, running commands in-line with chat, and human collaboration.
ChatOps serves as the bridge between your applications and processes, collaboration tools, people, processes, and automation into a single transparent workflow. It brings the communication and the execution of software development and operational tasks to a common platform.
With the help of ChatOps, you can bring service owners, SREs and on-call engineers together to:
You can consider using automated ChatOps tools to further accelerate your incident response. For this purpose, teams have already started integrating chatbots that can automate conversations, call an API, reset a server, and trigger processes both internally and externally.
One of the most common chatbots used in this area is Hubot which was originally developed by GitHub. It has one of the most comprehensive sets of scripts to manage the interactions with third-party services. Other common examples include Lita, a bot written in Ruby, an open-source Python project Errbot, Cog which is extensible in any language, and YetiBot which is written in Clojure.
At present, all the above chatbots require a very specific syntax to execute commands, which means there is a learning curve involved. However, some teams inspired by JARVIS are working to integrate NLP (natural language processing) capabilities into these bots. With this capability you would be able to say, “show me the time graph for XYZ system,” or “let me see the last 10 lines in ABC system log.”
In the age of Alexa and Siri, it’s likely such bots will soon gain prevalence in development and operations. Teams can reduce their MTTA/MTTR and the cost of an outage is drastically reduced with the automation of incident management workflows.
Leverage the full power of your automation, monitoring, alerting and collaboration tools within a centralized incident management tool. Try a 14-day free trial of VictorOps to start making on-call suck less across your entire organization.