World Class On-Call & Alerting - Free 14 Day Trial: Start Here.
Software developers and operations teams are constantly improving the way they move code into production and execute tests to maintain consistent delivery of reliable services. But, how do most organizations track the success of organizational changes? When a company adopts DevOps principles, how do they show the value of these changes to the engineering teams and the overall business?
Different businesses will place different levels of importance on different aspects of the software delivery and incident management process. So, before leveraging DevOps practices to improve the velocity AND reliability of products and services, teams need to ask themselves what they’re focused on improving, why they’re improving it, how they’re going to achieve this and how they’ll measure success.
Tracking DevOps delivery value can only truly be done on a case-by-case basis. But, there are a few crucial KPIs and metrics that nearly any team can look at when trying to figure out the efficiency of their DevOps implementation. So, we’ll cover the main principles of DevOps that should apply to any team, why it’s important to track DevOps success and some useful metrics and KPIs for reporting on the value of DevOps-centric software delivery.
Before you begin tracking the effectiveness of a team’s DevOps organization, you need to first understand the value of DevOps on the whole. Why should you even implement DevOps if you don’t understand the value it’s supposed to provide? While DevOps practices manifest themselves differently between different organizations, there are 6 core tenets that will apply (in somewhat different levels) to any engineering department taking on a DevOps transformation. Effective DevOps teams find ways to bolster and track the following principles:
Collaboration’s always near the top of any list of DevOps principles. The name, DevOps itself implies the connection of development and operations. Developers not only learn to write good, production-ready code that jives well with operations’ expectations but IT professionals get more exposure to the development pipeline for more proactive testing and QA. A DevOps strategy tightens communication between IT operations and developers, no matter where a service is in the delivery process. DevOps-centric businesses will always focus on finding ways to improve teammate collaboration across all engineering and business teams.
Automation in DevOps should really be classified more like strategic automation. A lot of IT and DevOps businesses spend time automating processes and tools just to automate them. But, a large number of teams fail to ask why. Why should you automate a certain task? People should really ask how automation will improve the lives of humans on the team while simultaneously increasing efficiency. With that said, automation can and should be implemented in a thoughtful way throughout all stages of the development, deployment and incident management process.
A focus on automation and collaboration should naturally lead to more transparency in development and incident response workflows. DevOps truly is a form of Agile IT – focusing on Agile development principles but allowing for greater testing, chaos engineering and QA during the process. With greater transparency, IT operations are more informed during new releases and better understand the code being pushed by developers. And, on the other side, developers are more aware of how their code performs in production, helping them develop more resilient features and services at a faster clip.
Better collaboration, improved transparency and automation of remedial day-to-day tasks will drive greater exposure to technical applications and infrastructure. Developers who better understand the upkeep of production environments will write better code and IT professionals who better understand the development process will be better equipped to get more out of continuous testing and delivery. Greater exposure in DevOps leads to fewer bottlenecks in the development, release and incident management lifecycles. The only way to learn is to get your hands dirty and start involving yourself into different parts of the system – showing the overall value of greater exposure.
DevOps puts the accountability for uptime and service resilience on everyone in engineering and IT. In DevOps, developers take on-call responsibilities and open themselves up to incident escalations in order to help fix issues when they pop up. IT teams are no longer solely responsible for detecting issues in production and remediating problems on their own. Who better to fix an issue than the person who wrote the code leading to that problem? Developers and operations teams are learning to work together to share the responsibility for positive customer experiences – leading to more reliable services and greater business value.
Above all else, DevOps is about continuously improving the way people, processes and technology interact with one another. As teams detect incidents and resolve them, they need to learn about what worked well, what didn’t and information they’re missing in order to expedite the process. DevOps-centric teams will take time to conduct post-incident reviews and review the historical incident context in order to better understand their workflows. DevOps continuous improvement will always be focused on finding ways to improve collaboration, automation transparency, exposure and accountability.
Now that you know what DevOps is, you need to know why you should implement it. And, just as importantly, how you track the failure or success of your DevOps implementation. By simply declaring that you’re moving toward DevOps principles without a defined way for tracking success, you may as well be yelling into the ether (as shown in this Michael Scott meme).
While it’s good to move toward a culture of DevOps, you need to know what’s working and what’s not as you take on any kind of organizational change. For instance, maybe you implement a new continuous testing framework that theoretically works better but is actually leading to longer gaps between releases without improving reliability. But, how can you identify these inefficiencies or bottlenecks if you don’t measure the overall performance of your people, processes and technology?
So, we’ll cover some key metrics and KPIs that can help developers and operations teams influence a highly effective DevOps transition and add value to the business through improved service delivery.
The faster you can deploy, the faster you can deliver value to end-users. High performers will make multiple deployments per day while lower-performing IT and DevOps teams will deploy anywhere between once per month or once per six months. Breaking down deployments into smaller sections and learning to make deployments continuously allows engineering teams to get products and minor enhancements into the hands of customers faster.
One way to measure DevOps delivery value is through monitoring your deployment frequency over time and seeing how it improves. Keep a timeline of major changes made to organizational structure, personnel or process and overlay that with your deployment frequency in the same timeframe. Ideally, you’ll see that changes to your process or philosophy is leading to faster deployments.
But, speed isn’t the only thing to worry about in DevOps. How often are deployments successful? How exactly are you measuring success? This will look different from team to team but, as transparency is a major part of DevOps, engineering and operations teams should know how often deployments are successful.
Based on your product and the customers you service, determine what success means to you. Is it simply that the deployment goes into production without causing any outage or error? Do you track the changes for a week, a month, to determine the long-term success of the deployment? There’s no right or wrong way to track this metric but it’s important to overall DevOps delivery value to define how you track deployment success and then track the total percentage of successful deployments. Ideally, you’ll see this percentage increase as you adopt more and more DevOps principles.
How long does it take to go from commit to production code? High performers can do this in hours whereas low-performing IT and development teams can take months. Keep track of lead time for changes and ensure the changes you make as you adopt DevOps makes this faster over time.
Out of the number of alerts ingested into your incident management and response systems, how many of those are major incidents? How often are DevOps processes leading to incidents in production? As teams start to move faster and deploy more frequently, you’ll likely find the total number of production incidents going up. This is very common at first as you try to iron out continuous testing policies, release management processes and monitoring and alerting enhancements. But, you want to ensure that the total number of production incidents doesn’t continue to go up over time and that the count of incidents doesn’t outweigh the value served to customers.
Somewhat correlated to the total number of incidents, DevOps delivery value can be tracked by measuring downtime and availability as KPIs. The amount of downtime your service experiences and the level of service availability for end-users directly shows the reliability of your applications and infrastructure. As you implement DevOps practices, you should see availability go up, amount of downtime go down and the costs associated with both should decrease. With less downtime and greater availability, DevOps organizations can likely promise more enticing SLAs, SLIs and SLOs to customers.
Tracking incident management metrics and KPIs such as mean time to acknowledge and resolve can also show you how well your team collaborates and shares information during real-time firefights. DevOps is about more than streamlining the development and testing phase, it’s about streamlining the way teams work together to fix production incidents. The faster you can triage notifications, acknowledge incidents and remediate major issues, the faster you can get back to developing new features and services for customers. Tracking MTTA and MTTR over time can help you see whether your DevOps processes are actually leading to more collaborative workflows and greater transparency between different teams.
In addition to MTTA and MTTR, the team needs to track how long their average incident response is. At what stages of the incident lifecycle are you spending the most time? Does it take the most time to simply triage an incident and get an alert to the right person? Or, does it take the most time to get access to the proper tools and implement a remediation strategy? As time goes on, the average incident response time should continue to decrease and show the value being provided by a DevOps mindset.
Once you begin adopting DevOps, you also need to define a timeframe for “successful” incident resolution. Then, you can track the percentage of incidents that are resolved in the designated time frame. As the percentage rises, you can see that the effects of a DevOps culture are having a positive impact on your organization’s efficiency.
While this can be harder to track, it can drastically improve the way your team operates to think creatively about what you miss out on when spending time on other projects. For instance, you could potentially correlate the reduced amount of time spent responding to incidents to the increase in deployment frequency in a DevOps culture and show the additional value this creates for customers.
Behind all the DevOps metrics, KPIs and principles is a constant desire to test, learn and improve. And, the highest performing DevOps organizations are using continuous improvement as a way to continuously deliver more powerful, resilient applications and services. As you deepen your team’s relationship with DevOps, you should begin to see these metrics and KPIs reflect more favorable measurements. This way, you can track the overall value of your newly-implemented DevOps delivery system.
But, with DevOps, your job is never done. You should get to a point where you feel comfortable with the delivery speed and reliability of the services you build. At this point, you can start to see the real value of DevOps delivery. Once you’ve built a stable DevOps base, you can continue to build upon that to become even more proactive with automation, testing and incident management.
Now that you’re tracking applicable metrics and KPIs to measure DevOps delivery value, you can start to correlate those measurements with their impacts to the business. Can you track the costs of downtime and the financial benefits from reducing it? Maybe you start to track the improvement of win/loss rates in sales over time since the implementation of DevOps? There’s no right answer for how to track overall DevOps value but it’s important to show not only how DevOps improves engineering efficiency but to show how it’s improving the entire business.
Learn how DevOps drives improvements to on-call management, alert automation and real-time incident response. Check out our recent article, Why DevOps Matters to see how modern IT and development teams are adopting DevOps to resolve incidents faster and build new services faster.