Building reliable services is expected in today’s software ecosystem. Customers don’t care whether you’re working with highly integrated systems or in a more controlled environment–they simply need your service to work. Site reliability engineers (SREs) help you adhere to SLAs and SLOs, run tests, monitor software and hardware, and iterate processes to build reliable applications and systems.
Organizations need to support SRE efforts because of the potential costs of downtime, the customer expectations of reliability, and the increasing speed of software development. Reliability engineers can create a cohesive culture of reliability by adding visibility into the software development lifecycle (SDLC), creating processes, and collaborating with teammates to make systems more robust.
By having a hand in a little of everything from monitoring a server’s disk space and setting up failovers, to writing code for a web application, SREs can see the bigger picture of system reliability–no matter how complex the infrastructure. Over time, this exposure helps spread system knowledge across all of engineering and helps teams build more robust services. So of course, SRE begins with getting company-wide buy-in.
Everyone from the leadership team down to your most junior software engineer needs to understand the benefits of SRE–and help support those SRE initiatives. Teams that adopt SRE practices will help fill gaps between developers and IT operations teams, helping cultivate a more collaborative DevOps-type culture. So, we thought we’d lay out some of the challenges to getting buy-in for SRE, and a number of ways you can go about getting buy-in from leadership and others in your organization.
SREs can act as a bridge between IT operations and development teams. By understanding how to write code, take ownership of that code, and maintain it, SREs will help bake reliability into everything across the entire system. It’s important to make this clear to leadership, engineering, customer support, sales, marketing–everyone.
However, organizations continuously prove to be reluctant to change. Many times, change means additional costs, lost productivity, and general confusion. In order to get organizational buy-in for SRE, you need to convince leadership that reliable systems reduce costs, while simultaneously convincing your teammates that shared responsibility of development and operations is beneficial. Make sure to address these points when presenting to the leadership team about getting started with SRE.
There’s a large need for SRE, in one capacity or another, for every engineering team. Now, let’s look at some tips for making the case to leadership about the importance of SRE within your own organization:
You’ll need to present a compelling argument to leadership if you’d like them to get excited about change. To get started, here are a few suggestions of things to cover that should help you get organizational support for SRE:
SRE is only as good as the team supporting it. Effective SRE depends on collaboration from other teams and business units. The more visibility an SRE team can get throughout the entire SDLC and incident management lifecycle, the better they can assess ways to improve the system. With reliability engineers solely focused on finding pain points and scalability concerns, the rest of the DevOps team can maintain more consistent continuous deployment and integration.
Giving time and energy to the efforts of SRE means you’re more prepared for incidents when they occur, and issues are escalated less often. Site reliability engineers can conduct post-incident reviews, create actionable runbooks with context, help on-call engineers respond to problems, all while simultaneously ensuring a greater level of reliability in new deployments and current infrastructure.
Implementing a culture of SRE dedicated to reliability and collaboration can take time. Find ways to showcase the value of SRE and present it to the rest of your team. For an example, you can learn more about how (and why) we bought into SRE by downloading our free eBook, “Build the Resilient Future Faster: Creating a Culture of Reliability.”