VictorOps is now Splunk On-Call! Learn More.
As time passes and CI/CD is adopted more and more, many software development organizations are moving toward the approach of developing applications within shorter development cycles. These shorter cycles and more efficient releases allow organizations to get new application features and other modifications out the door faster than ever before. While this approach brings with it a great many positives, it’s not without challenges.
Historically, deploying an application came with the acceptance of a potentially significant service interruption for the customer. This downtime could be due to a defined portion of the deployment process itself or the result of the time it took to roll back a faulty release. While this may have been acceptable in a world where deployments occurred months apart, this is no longer the case with organizations delivering application changes to production on a more frequent basis.
So, how can development teams minimize the downtime experienced by the end user when a release is being deployed? And, what strategies can developers put to use in their CI/CD pipeline to mitigate the risks associated with continuous delivery?
Minimizing disruption to the customer is critical for any organization in all scenarios. In the context of deploying new code, disrupting the customer can be hard to avoid. Consider the following strategies for deploying code. These techniques can help reduce or eliminate the amount of downtime experienced during and after application delivery:
One such strategy for deploying with minimal disruption to customer experience is known as the Blue/Green strategy for application deployment. Let’s say that, in production, the Blue version of the application is running and all traffic is routed to this version of the application. In this example, the Blue version represents the old version of the application.
At this point, the Green version of the application is deployed in an equivalent production environment. However, no traffic is being routed to the Green version. By now, you may have guessed that the Green version represents the new version of the application. So, both Blue and Green are now running simultaneously, with all traffic being directed to the Blue version. Once satisfied that the Green version has been properly deployed and is now ready for prime time, traffic can simply be routed away from the Blue version and toward the Green version.
In addition to limiting the downtime normally experienced with traditional application deployment, the Blue/Green deployment strategy offers additional benefits. One benefit is the ease of rolling back deployments and responding to incidents should the Green version prove to be error-laden after traffic has been rerouted.
Consider the scenario in which catastrophic bugs in the newly-updated application are discovered upon rerouting traffic to the Green version. In this scenario, leaving the Blue version of the application running for an extended period of time after the switch will provide an easy rollback strategy to the previous stable release (Blue). In effect, you can simply route traffic back to Blue and you have stopped the bleeding, so to speak.
Another deployment strategy that can help mitigate the risk of significant downtime is the canary deployment strategy. Named for the historical practice of miners carrying a canary with them into oft-toxic mines in order to detect carbon monoxide (the canaries would die before the workers, thus warning them to hastily exit the mines), a canary deployment serves as a strategy to alert development teams of an error-prone release before subjecting the majority of their user base to new modifications.
A canary deployment is implemented in the following manner:
At the outset of the deployment, version A of the application (the older, stable version) is the only version of the application currently deployed across the production infrastructure.
When ready to deploy, version B (representing the new release of the application) is deployed to a specific portion of the production infrastructure.
After version B is deployed, version A should be configured to receive a large portion of the traffic to the application. For the sake of this example, let’s say 90%.
The remainder of this traffic (10%) is to be directed at the portion of the infrastructure where version B has been deployed.
With only a small subset of users visiting the new version of the application, the situation is monitored for a period of time. If no red flags are raised after a reasonable amount of time has passed, version A is phased out in favor of version B. This continues until version B is rolled out across the entirety of the production infrastructure.
There are two main benefits to a canary deployment, the first of which is the limitation of the impact that a faulty release will have on the customers. Considering the above example, only 10% of the traffic to the application will encounter the updated version. Therefore, if the release proves to be unstable and riddled with bugs, 90% of the traffic wouldn’t have been impacted.
The second major benefit is in the control maintained by the development team throughout deployment. Since the impact to the customer base is relatively small, the development team can either work through issues in the release to get it up to par, or simply abandon the deployment. Then, they can direct all traffic away from the nodes that have been updated in the event that serious bugs are discovered. In other words, much like a Blue/Green deployment, rolling back becomes a relatively straightforward process.
CI/CD, by definition, lends itself to faster software delivery, and this increased speed of delivery has made it a challenge to maintain application quality throughout releases. Thus, taking steps to mitigate the risks posed by faster development cycles has become a priority for any good software development organization.
In addition to employing an effective deployment strategy such as Blue/Green or canary, implementing automated testing throughout the CI/CD pipeline can greatly increase the organization’s ability to maintain code quality throughout the development process. Just because a release has been deployed doesn’t mean the effort to maintain application quality is over. In an effort to improve an application across releases, strategies for testing in production (i.e. A/B testing) as well as application monitoring should become a regular part of the post-deployment process.
Deploying with minimal downtime should be a high priority for any quality software development organization working within short delivery cycles. Solid guidelines for any good deployment strategy are to limit the impact a faulty release has on the user base, and to provide the development team with a straightforward mechanism to revert to the most recent and stable version of the application.
By implementing a deployment strategy that follows these guidelines, an organization ensures that users are less likely to meet problems in production, leading to an increase in both customer satisfaction and customer confidence in the product.
A real-time on-call incident management and response tool can minimize downtime and lead to more resilient applications and infrastructure. Sign up for a 14-day free trial of VictorOps or request a personalized demo to see exactly how DevOps and IT teams are making on-call suck less.