The practice of putting developers on-call isn’t new. A lot of startups and even bigger organizations are now getting developers on-call to meet their production demands. This has helped organizations achieve higher development speed and service stability, with fewer necessary rollbacks.
However, on the other side, some organizations are still reluctant to bring the responsibility to their developers. The general perception is that developers hate being on-call which is increasingly getting challenged by DevOps. DevOps is bridging the gap between operations and development teams and providing a holistic approach to software development and incident response. It’s in this context that we’ll discuss why getting developers on-call is imperative to operational efficiency and business success.
With the constant pursuit of agile development, many organizations today rely on the cloud – which has become the core of their mission-critical applications. Cloud-based applications and infrastructure give businesses unmatched ability to focus on their strategic business goals while outsourcing their infrastructure worries to a third party.
Cloud-based development platforms, automation, microservices architecture, APIs, and containers have become the cornerstone of software development. These new development trends have replaced the monolithic architectures and releases with smaller, iterative release models and made rejigging the on-call team a business imperative.
In the new paradigm, application errors are more common than incidents arising from the underlying infrastructure. The traditional ops teams are not always equipped to handle application errors. They often have limited experience with DevOps and find on-call responsibilities an undesired burden. In a survey conducted by VictorOps, 56% of businesses reported loss of revenue as the largest of their downtime costs due to on-call latencies. You can reduce these costs of downtime with healthy on-call practices and a culture focused on DevOps.
Getting developers on-call is the best way forward. In fact, industry trends show that DevOps continues to gain steam year after year. VictorOps’ 2016/17 State of On-Call report highlights that while operations teams still have a major share of on-call duties, developers and DevOps professionals form the next major group.
In the same survey mentioned above, Michael D’Auria, Infrastructure Lead at CrowdTap says his firm is facing fewer production issues ever since it brought developers on-call. Michael said, “…Now that they know they can be woken up at 4 in the morning, they deploy when they know they can be available to fix things. There are fewer production issues now that our devs fully understand what it means to push to production.”
While DevOps is a big change in the culture and organizations are reluctant to disturb their set practices, making developers a part of on-call schedules is often the most viable change in the long run. Here are some of the practices that can help with a smooth transition:
In geographically distributed development teams, the most annoying thing about being on-call is to receive late night alerts. This is completely unjustifiable if you have an idle team which can attend the same call during the daytime. The problem can be resolved with proper follow-the-sun rotations and scheduling that takes different time zones into account.
Getting developers on-call ensures that the first responder is often equipped to resolve the issue independently. Only when there are dependencies or the developer needs assistance will they need to engage other on-call team members. This will ensure the developers are in the loop and everyone isn’t overwhelmed or continuously engaged in firefighting.
Most incident management solutions lead to alert fatigue with unacceptable levels of false alarms. Your DevOps team can constantly monitor your tools and processes to reduce alert fatigue by limiting multiple alerts for the same issue, redefining anomaly thresholds to match your current system and creating actionable notifications that offer real-time logs, reports, and metrics to expedite issue resolution.
On-call developers can significantly reduce your downtime and improve your system reliability. Being responsible for the entire development and deployment cycle, developers and DevOps professionals will have the right context and know-how to fix issues faster.
Further, constant exposure to the live environment and a culture of code ownership will help engineers use the feedback to improve their products over time, reducing the frequency of rollbacks and failures. Here are a few examples of how getting developers on-call have proved advantageous to organizations:
Google has brought developers on-call as it found that the ops team was not able to meet the high demand for keeping systems up and running. The company created a new team of Site Reliability Engineers (SRE) with half of the team members being SysAdmins and half of them being software engineers. This team helped Google achieve a multi-fold increase in its operational efficiency.
Amazon has followed a similar approach and its developers are responsible for end-to-end ownership of their code – which means they have to run, maintain and handle on-call responsibilities for fixing any issues in the wild. This has helped Amazon in developing a more holistic build-to-deploy mindset among its developers, creating a significant drop in their production issues.
Entrusting developers with communication responsibilities will help your DevOps organization remain agile with new releases while reducing MTTA/MTTR. On-call incident response and alerting software like VictorOps can help you redefine your incident management processes. Try the 14-day free trial to streamline incident response and resolution via integrated cross-team collaboration and context-rich notifications to start making on-call suck less.
Download our free eBook (no information required), Why DevOps Matters, to continue reading about the importance of getting developers on-call, and why DevOps is allowing teams to build reliable services faster.