World Class On-Call & Alerting - Free 14 Day Trial: Start Here.
In some ways, modern application monitoring has become a paradox. On one hand, today’s applications and the environments that host them spew out more data than ever, which theoretically gives IT teams an unprecedented ability to monitor and observe the applications. On the other hand, however, there’s often so much data to parse that gaining meaningful insight through monitoring becomes impossible in practice.
In other words, the more information applications produce, the harder it becomes to monitor them.
This is part of the reason why some IT teams have begun turning to the concept of “production monitoring signals.” Below, I explain what production monitoring signals are and offer tips for making the most of them.
Put simply, production monitoring signals are specific types or categories of metrics that IT teams prioritize when parsing through all the monitoring data produced by their applications.
Production monitoring signals could theoretically be any type of metric. However, the two major monitoring signals frameworks created to date – Google’s Four Golden Signals and Weaveworks’ RED Method – center on metrics associated with application load and errors.
There are plenty of other types of data generated by applications and the environments that host them, ranging from disk and CPU usage, to network bandwidth and authentications per second, to name just a few. These other metrics might also be useful sources of insight into an application’s availability and health, and by all means, you should collect them.
But, the point of production monitoring signals is to help teams decide which types of metrics get the greatest priority when analyzing an application or environment. Monitoring signals don’t mean you get to ignore other types of data, they’re simply a way to achieve focus and consistency in application monitoring.
Whether your team chooses to embrace the monitoring signals promulgated by the Four Golden Signals or the RED Method, or to create your very own set of signals to focus on, you can get more out of this strategy by adhering to the following best practices.
To the extent possible, choose signals that can be easily collected from any application, and that are relevant for any application.
This is important for two reasons. The first is that if your signals are tied too closely to individual applications, generating an interface for your monitoring tools to collect those signals will probably require custom configuration or programming within each application. That adds overhead to your monitoring operations. A better approach is to focus on signals that are generated by your application or its hosting environment organically and automatically – like errors or requests that are automatically logged to standard locations.
You also don’t want to focus on signals that are only useful for understanding certain applications. For example, making disk usage one of your signals would typically not be a good idea, because not all applications have persistent storage (and not all applications that persist data do it via disks). Instead, stick with types of metrics that give you direct insight into the availability, health and downtime of any type of application.
Along similar lines, it’s generally a best practice to avoid production monitoring signals that can be collected only from applications written in certain languages, or that require special libraries to generate. This setup also limits the universality of your signal strategy and is likely to require custom configuration or coding.
This is why you shouldn’t use stack trace data, which is usually language-specific, as monitoring signals.
It might seem to go without saying that production monitoring signals should be tailored to your business’s unique needs. But the point is worth emphasizing because IT teams may sometimes be tempted to assume that a monitoring strategy that works for one organization will work for any other. Applications are applications, right? Regardless of the businesses they support?
Well, no. Business priorities vary from organization to organization, and production monitoring signals should reflect your own business’s priorities. Maybe guaranteeing uptime is more important to your business than application speed and performance in which case application errors are probably more important to track than traffic-related metrics. Or, perhaps cost-optimization is a critical business goal. In that case, you might choose to include cost-related metrics among your signals.
Last but not least, remember the purpose of monitoring signals is to give everyone on your team a clear and focused way to interpret the health of your application. For that reason, choose signals that are relevant for all stakeholders – developers, IT engineers, quality assurance folks and anyone else who plays a role in delivering and managing applications.
If you choose signals that matter only for certain groups, not only do you reduce the effectiveness of the signals in delivering total visibility into your applications, you also make it harder to get your entire team to buy into your signal-based monitoring strategy. If only part of your team pays attention to the signals, then the signals fail to achieve their goal of providing a consistent, across-the-board source of visibility for your entire team.
Production monitoring signals are a useful solution for seeing through all of the noise of today’s complex, data-rich application environments. But while frameworks like the RED Method and Four Golden Signals are good starting points, using monitoring signals effectively requires tailoring signals to your organization’s needs while also taking steps to ensure that signals can be used across your entire environment and team.
Once you’re monitoring production signals, how do you take advantage of that information for rapid incident response? See how VictorOps creates a centralized dashboard for on-call management, alert automation and real-time incident response – sign up for a 14-day free trial or request a personalized demo with our sales team today.
Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, was published in 2017.