The world of software development, delivery, and support has changed radically in the past 15 years.
Though Agile was initially a huge conceptual, procedural, and cultural leap, today it is widely accepted as the standard process for software development. Agile’s rapid build and deployment cycles deliver the most valuable functionality more quickly. However, it places new stresses on the people and systems responsible for quality and uptime.
Enter DevOps. DevOps is a large, loosely defined set of processes that promote the collaboration of development and operations teams to ensure that testing, deployment, configuration management, and support can keep pace with the speed of Agile software development. It is essentially Agile for the rest of the software lifecycle.
DevOps Tools Fall into These Two Major Categories
The major tools and processes that have been introduced by the DevOps approach include:
1. Automation Tools
These tools reduce process performance time and propensity for errors while improving reproducibility. Examples include:
- Automated Configuration Management (called Infrastructure as Code)–using tools like Puppet, Chef, Ansible, and Salt
- Continuous Integration–using build and deployment tools like Jenkins, which allow for automated builds and unit test runs on code check-ins
- Continuous Release / Deployment–also using tools like Jenkins, via automated scripts
2. Monitoring and Metrics Tools
These tools monitor the health of an organization’s systems and processes.
While both types of tools are extremely useful, there will always be service-affecting issues and outages that people need to fix.
Embrace Rapid Build, Break, and Repair Cycles
In fact, a progressive DevOps principle is to iterate quickly and allow things to break, but be able to find and quickly repair what broke. DevOps teams should assume failures will happen and work to “build the muscle” it takes to respond quickly and to continuously learn and improve.
Let’s Call this Process Continuous Support
A good term for this concept is continuous support. Though this part of the DevOps lifecycle doesn’t directly shorten time to build and release, it is a critical factor in reducing the errors and instability that are inherent in rapid deployment.
But when managers introduce DevOps processes, the concept of continuous support is often overlooked. It is easier to focus on implementing automation and monitoring tools than it is to empower people to collaborate and fix critical issues faster.
Take These Five Actions to Drive Continuous Support
1. Share the responsibility of support – Give your developers a steady on call scheduling system and put them on-call.
As new code is pushed to production more frequently, often the only person who knows how that code works is the developer who wrote it. This is even more relevant with the current interest in microservice architectures. Microservices allow development teams to take ownership of individual components of loosely coupled systems, including the ability to release those components independently. It makes sense that developers should not only be responsible for building and testing their code, but also for supporting it if there are problems.
2. Make sure your notification system gets the right people involved quickly
Specific people often know how to solve specific problems. Set up your notification system to direct incidents to the right people to solve problems as quickly as possible.
3. Keep remediation documents (runbooks) up to date and easily accessible
A well-defined set of steps is often used to diagnose and/or resolve an incident. Make sure these steps are immediately available to the person solving the problem. If these steps are not documented, make sure to get this tribal knowledge recorded in a runbook to save a lot of time and stress.
4. Use ChatOps and/or conference calls to collaborate (e.g. Slack, HipChat)
Some incidents require people from different teams to work together to solve a problem. Choose collaboration tools that support teamwork and the ability to quickly get context.
5. Conduct blameless postmortems for larger incidents and weekly on-call reviews
Objectively reviewing an incident’s sequence of events can help pinpoint specific, service-affecting issues to address. Weekly on-call reviews help reduce unnecessary alert noise, identify missing runbooks, and prepare on-call teams for what to expect during their shift.
Effective DevOps Practices Support Delivery and Support
The most important thing to remember is that DevOps is a culture that promotes sharing the responsibility of both software delivery AND support.
To fully realize the promise of Agile and to continuously release high quality software, we need both excellent tools and excellent minds. Let’s automate the parts of the process that don’t need human intervention, and arm DevOps professionals with the best systems to support their collaboration and hard work.