Get up to 50% off! Limited time only: Learn More.

Being On-Call: The Future of Incident Management

Kelsey Loughman April 12, 2019

DevOps On-Call
Being On-Call: The Future of Incident Management (DevOps) Blog Banner Image

Software developers and IT teams are deploying code faster than ever. DevOps principles are giving way to improved collaboration and transparency across the entire software delivery lifecycle (SDLC) – helping teams maintain agile CI/CD pipelines and driving service reliability. In a globalized world of software, customers are using applications and services 24 hours a day, 7 days a week – with expectations of constant uptime. So, being on-call for incidents the applications and infrastructure you build and maintain is a necessity.

The future of incident management relies on developers taking ownership of the code they write and sharing accountability for system resilience with IT operations. Being on-call should be more than an anxiety-riddled weekend wondering when your next SMS or email alert will come through. Teams need to share historical knowledge and maintain detailed documentation so on-call responsibilities can be shared across the organization – not simply placed on a small group of teammates.

The goal of on-call is to always to maintain service reliability for your customers. A DevOps culture of code ownership and shared accountability across the company will lead to a better understanding of how your systems work in production – leading to faster development and incident remediation. Creating an organizational culture of customer empathy in your development and IT teams will also drive buy-in to on-call operations. Bringing customers closer to your DevOps team will help everyone feel better about being on-call, reduce incident remediation time, drive deeper system reliability and help you bring better software to market.

The future of incident management is DevOps

No two teams are structured exactly the same. Software is never built exactly the same way. So, why would we expect a specific incident management framework to work for every team? On-call responsibilities and incident management in IT operations traditionally followed a specific set of ITIL principles. But, development throwing code to IT operations to deploy is simply slow and ineffectual for complex teams and development lifecycles.

Forcing IT operations to respond to alerts as they come in and escalate issues to software developers is highly reactive. What if you could automate much of this process so the right person received the right alert when they needed it? By putting developers on-call alongside IT operations teams, alerts are routed to the person who can actually fix the problem. This will reduce alert fatigue and circumvent the need for one person or team who simply spend time rerouting or escalating alerts.

Developers can’t force IT operations to work on problems they can’t fix. This isn’t an effective use of anyone’s time. By sharing accountability for the services you create and setting up automation around your alerting and incident response processes, you can reduce MTTA/MTTR (mean time to acknowledge/mean time to resolve) from hours to minutes. The future of on-call incident management includes a DevOps culture dedicated to continuous improvement of collaboration, visibility and automation.

When developers take on-call duties, they see reliability concerns firsthand and build deeper customer empathy – helping the whole team build and maintain better applications and infrastructure.

DevOps in Incident Management

Being on-call with a DevOps team

Today, taking on-call responsibilities is a necessity for development and IT teams looking to maintain an agile CI/CD pipeline and remediate issues quickly. Tightened collaboration between developers and IT professionals, alongside better workflow transparency on both sides of the coin, leads to faster software delivery and better service reliability. DevOps teams will integrate on-call incident management into the delivery pipeline to not only help identify incidents currently in production but to quickly detect incidents as they’re being deployed.

DevOps-oriented teams are excellent at identifying incidents at both the beginning and end of the delivery lifecycle. Then, through deeper visibility to workflows and high levels of collaboration, DevOps teams can quickly respond to an issue when it comes up. Developers and IT operations should be working together to monitor and alert on issues across the entire stack. Then, with improved automation in both software delivery and incident response – DevOps teams can surface information immediately to the right person.

Active on-call builds customer empathy and service reliability

Actively taking an interest in on-call responsibilities makes you a better developer. Developers on-call will build customer empathy and find more areas to improve reliability and functionality. Then, when an incident occurs, the team can fix issues faster. Combining developer expertise with IT operations expertise in on-call incident management leads to the best outcomes. DevOps allows you to take the best from IT and the best from software development to continuously delivery reliable applications and services.

DevOps teams can use VictorOps to maintain on-call schedules, create automatic alert rules and deepen visibility into incident workflows. Download our free eBook, Why DevOps Matters, to learn more how DevOps-centric teams can improve software delivery speed, incident response time and make on-call suck less.

Ready to get started?

Let us help you make on-call suck less.