Get up to 50% off! Limited time only: Learn More.

The Production Environment Review Checklist

Chris Tozzi November 20, 2019

DevOps Monitoring & Alerting Release
The Production Environment Review Checklist Blog Banner Image

You’ve written code, you tested it and built it. Now, your release is ready to deploy into production.

But, is your production environment ready for the release? That’s a question that every IT professional and platform engineer should be asking before accepting a new release – whether the release is an update of an existing app or a totally new deployment.

Toward that end, here’s an overview of the items you should be checking off to make sure that your production environment is ready to go. Obviously, your particular mileage will vary and this list doesn’t address every aspect of every production environment in existence. But, you can think of it as a starting point for designing a production environment review that ensures you’re ready to accept a new release. And, just as important, that you have the resources you need to support the release once it’s in production.

So, let’s dive into the checklist to understand some things you should always review in production environments to improve release quality and cadence.

The Checklist for Production Environments and Successful Releases

Infrastructure mapping

Before accepting a new release, you should have a clear understanding of the infrastructure that will support it. Whether your infrastructure is on-premises, hybrid, a single cloud or multiple clouds, you want to be able to map the architecture and know which specific servers or services will be hosting the release.

Delivery chain

How, specifically, is your application being released? Which tools are you using to deploy it? When should you expect the next release? You can help answer all of these questions by making sure you understand the delivery chain that is pushing the release out. Although the delivery chain is not part of your production environment per se, it plays a major role in ensuring that you can place applications into production successfully.

Service mapping

If your application consists of multiple services – as many do in today’s cloud-native age – you should have an understanding of how those services interact in order to compose the complete application. Which APIs or other interfaces do they use to communicate? Where is each service hosted?

Network mapping

Network mapping allows you to understand how your network or networks are configured. It’s particularly important in today’s world of complex, multi-layered, software-defined networks. Be sure that your network map includes not just public-facing endpoints, but internal devices (and that it distinguishes between external and internal endpoints).

Firewall configuration

Although the nature and efficacy of firewalls for cloud-based workloads isn’t what it was in the days when everything ran on-premises, firewalls are still useful tools. Before deploying a new release, be sure that yours is working and configured properly.

SLAs and contracts

Another basic requirement for a successful release is knowing which contractual requirements your production environment needs to be able to support. If applicable, check SLAs or other contracts and be sure that your production environment is ready to support any uptime, data recovery or other requirements specified in them.

Backups and disaster recovery

What is your plan for backing up data or workloads and recovering them in the event of a failure? Your backup and recovery tools will vary widely depending on factors such as which type of infrastructure you use or what your recovery requirements are, but you should ensure that you have a backup and recovery plan in place.

DevOps in Incident Management

Incident response

You also need a plan that defines who does what in the event that something goes wrong, and tools to help coordinate those activities. Even the best-designed production environments sometimes suffer failures or disruptions, and a solid incident response plan is the difference between a mere hiccup and a total disaster.

Performance monitoring

Deploying applications successfully requires not just getting them into production but ensuring that they perform adequately once they are there. Which APM tools and processes will you use to monitor for performance challenges, identify their source and respond to them? Which team members are responsible for overseeing these tasks?

Security monitoring

How will you monitor for vulnerabilities or breaches in your production environment? Keep in mind that security monitoring is a multi-layered affair that should extend from application code to environment configurations to identity and access management and beyond.

Remote access

Which remote access tools, if any, will you use to support your production workloads? Are they properly secured? You want to be able to answer these questions before deploying a new release.

Physical access

Likewise, if you rely on physical access to manage the environment, is that access available and is it secure?

Cost control

In the era of the cloud, it’s easy for costs to spin out of control quickly due to unnecessary cloud resources left running or poor alignment between the type of cloud service you use and the workloads it hosts. Although you’ll ideally think about cost optimization before you deploy a new release, be sure that you have a plan in place for monitoring costs and identifying cost-optimization opportunities once your workload is running.

Future planning

Your production environment will almost certainly evolve over time. The infrastructure that hosts it, the configurations that govern it and the workloads deployed on it will change. For that reason, you should have a plan and process for accommodating future releases and needs, all while ensuring that your environment is able to adapt successfully as requirements change.

Successful future planning may be as simple as quarterly reviews. But, a better approach is to build a continuous feedback channel that allows all stakeholders (developers, IT Ops and everyone in between) to communicate about upcoming changes or new requirements that will be placed on the production environment.

When outages hit or the engineering team is struck with a major incident, how are you quickly mobilizing the team and fixing the problem? Sign up for a 14-day free trial of VictorOps or request a demo to learn how a centralized tool for on-call schedules, alert automation and real-time collaboration makes incident response suck less.


About the author

Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is the Senior Editor of content and a DevOps Analyst at Fixate IO. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, was published in 2017.

Let us help you make on-call suck less.

Get Started Now