VictorOps is now Splunk On-Call! Learn More.

Container Monitoring and Alerting Best Practices

Chris Riley July 17, 2019

DevOps Monitoring & Alerting
Container Monitoring and Alerting Best Practices Blog Banner Image

With the agility of modern development practices and infrastructure comes a new set of challenges – namely, that applications consist of more parts, and the relationship between infrastructure and code is much tighter. In this post, I will discuss the monitoring and alerting considerations that IT needs to think about for modern applications running on containers.

How modern infrastructure impacts monitoring

In the world of monolith and waterfall, what you monitored was fairly static and focused on the infrastructure layer. For monolithic applications, the distinction between issues in infrastructure and application were cut and dry. But, the configuration was also fairly static. So, once you set up appropriate monitoring and alerting, there wasn’t much to do other than cross your fingers and hope there were no anomalies.

Modern applications and infrastructure keep IT on their toes. New methods for development and deployment mean IT teams should be aware of several key differences in how monitoring and alerting needs to happen.

More complexity with containers:

Modern applications bring an added set of complexity and change velocity. In Containers are aspects of infrastructure that need to be monitored in addition to the server which those containers run on. In addition, the velocity of change via frequent application releases also means there are more events that could trigger issues. The number of elements increases substantially in application architectures that incorporate microservices because the application is made up of many containers, and the relationship between those containers is another critical element to watch.

Visibility and communication changes:

The changes in application infrastructure also impact the teams involved. Previously, the monitoring and alerting of infrastructure was an IT task and application performance monitoring was either jointly or solely owned by developers. Now, communication between IT and development needs to be more fluid. This doesn’t necessarily mean one-to-one communication but it certainly means the development/quality engineering team needs to get real-time alerts and access to monitoring data in order to quickly address issues.

Tighter coupling of infrastructure to application:

Containers are infrastructure but now they’re also application artifacts. Containers are not treated the same as application artifacts and the container image itself is a part of the overall app. This means that the status of the application’s infrastructure is now a part of the application and its quality. Tighter coupling of infrastructure also necessitates tighter coupling between the IT and dev teams, meaning development teams need to be more aware of what happens on the infrastructure level. A DevOps culture of transparency and collaboration improves software delivery and incident management.

Dev & Ops Incident Management

Best Practices for Monitoring and Alerting for Modern Stacks

Enterprises aren’t strangers to monitoring distributed systems. While monitoring and alerting practices for modern applications is similar to distributed ones, distributed systems tended to have their own monitoring and alerting silos, whereas each variable in the modern applications system is critical to the overall understanding of its status. This ends up slightly changing how we implement and think about alerting and monitoring.

Stdout is an important tool:

Stdout is your best friend when it comes to monitoring and alerting on data from containers and the containers’ host. First of all, it’s a standard, and second, developers can easily add context by writing to standard out. Stdout is a generic stream of visibility into the containers and what’s going on inside them, as well as the host on which these containers are running.

Robust monitoring:

In order to be successful with all the moving parts in modern applications, you want to limit constraints for how data is sent and make data collection as simple as possible. Stdout is part of this, but so is the philosophy that everyone on the team should consider data elements and events that might be relevant to downstream alerting and make it easier to ingest information via logs or APIs. The monitoring tools should have capabilities such as tagging and robust search to help correlate that data into meaningful triggers for an alerting system.

Alert across your entire toolchain:

Because of the velocity of modern applications, it’s critical to monitor and set up alerts for each stage of the delivery chain. System and delivery events are oftentimes key to identifying issues with applications themselves. So, it’s important to leverage data from all the tools in your DevOps toolchain and enable the ability to associate state changes with critical issues that occur in the application.

Monitor the entire stack:

To create full visibility into the application, the monitoring needs to cover the entire stack. This allows for a holistic view of applications and gives alerting the context needed to address and remediate issues. This means you need to set up monitoring on:

  • Containers
  • Clusters running the containers such as Kubernetes or Swarm
  • Communication & Telemetry between containers (this can be done via contracts or collecting logs from tools like ISTIO)
  • Host OS/machine running the cluster
  • Server running the hosts

Build visibility and share:

All consumers of monitoring data and anyone on-call needs to have visibility/access to monitoring tools and dashboards. When it comes time to double-click in response to an alert, the last thing an engineer needs is access issues. This is why it’s important to build global dashboards as a window into the application and infrastructure – and give appropriate access to the monitoring tool(s) to those who need it.

Give context:

Since there are a lot more interdependencies in applications with containers, the context matters a lot – especially with microservice-based applications where an alert that surfaces in one container might actually be the result of an interaction with another. Everyone on the team should think, “What information is useful to this component of the application if things go wrong?” This question will help you build monitoring across the entire stack and influence what metadata ends up in the alerts.

The biggest reality IT needs to face is that with modern application development practices and infrastructure like containers, the server which applications are running on isn’t the primary factor in the app’s stability. Yes, you’ll have to herd more variables. But, the power of modern monitoring and alerting tools is that the configuration of data streams and presentation in alerts is now much easier than it used to be. Because modern applications are intimately tied to the infrastructure they run on, it’s now necessary to monitor and alert on the entire stack as well as engage with the dev team.

A centralized location for monitoring data, on-call scheduling, alerting and collaboration makes on-call suck less for developers and IT. Try a 14-day free trial of VictorOps or request a personalized demo to learn more about using a holistic incident response solution for IT and DevOps.

About the author

Chris Riley (@HoardingInfo) is a technologist who has spent 15 years helping organizations transition from traditional development practices to a modern set of culture, processes and tooling. In addition to being an industry analyst, he is a regular author, speaker, and evangelist in the areas of DevOps, BigData, and IT. Chris believes the biggest challenges faced in the tech market are not tools, but rather people and planning.

Let us help you make on-call suck less.

Get Started Now