VictorOps is now Splunk On-Call! Learn More.

The Cloud Application Monitoring Guide

Dan Holloran October 16, 2018

DevOps Monitoring & Alerting
Cloud Application and Service Monitoring Guide Blog Banner

DevOps and IT teams have been monitoring and alerting on on-premise servers, networks, and applications for years. In comparison, because of the growing adoption of cloud-based services, it’s equally important to understand how teams are monitoring cloud-based infrastructures and applications. In fact, it’s estimated that 83% of enterprise workloads will be in the cloud by 2020.

As with most things in the DevOps world, there isn’t one single solution to monitoring your cloud-based applications or services. But, there are a number of useful techniques and tools that could help. So, if you’re new to the world of monitoring cloud services, or you’re simply looking to learn a little more–we created The Cloud Application Monitoring Guide to help.

Different Kinds of Cloud Services

As with monitoring on-premise solutions, effective cloud monitoring is all about improving visibility into your infrastructure and surfacing issues in service health. Teams are using more third-party cloud applications to help manage their workloads and maintaining their servers with services such as AWS, GCP, or Azure.

So, it’s necessary for IT and DevOps teams to not only monitor internal applications, networks, and servers–but third-party applications team members are using as well. Let’s take a look at a few of the different types of cloud services that can be monitored.

SaaS (Software as a Service): Web applications that provide some type of service to the end user. For example, Google Drive, Dropbox, Salesforce, etc.

PaaS (Platform as a Service): SQL databases, storage, and caching tools will fall under this category.

IaaS (Infrastructure as a Service): IaaS refers to cloud-hosted servers provided through services such as AWS, GCP, or Azure.

FaaS (Functions as a Service): Serverless applications such as AWS Lambda, Azure Functions, or Google Cloud Functions

Application Hosting: Ways to host applications in a cloud environment. Tools such as Heroku, Amazon EC2, Kubernetes, or Google App Engine fall in this category.

Concerns for Cloud Services

  • Cybersecurity:

    Due to the nature of cloud-based services, IT and DevOps teams are concerned about lack of control or visibility when it comes to security breaches. Monitoring, security, and orchestration/automation tools can help detect breaches and vulnerabilities, and quickly address them before issues get out of hand.

  • Compliance:

    Depending on your industry or business, using cloud-based services can have compliance risks. Make sure you look into the platforms and services you take advantage of to ensure compliance.

  • Highly Integrated Services:

    Cloud services can be highly integrated and rely on other services, cloud or on-premise, to function effectively. The fear is when incidents occur, they’d affect a large number of other services in your stack.

  • People Operations:

    If you’ve frequently worked with cloud services, this one may surprise you. But, many managers are concerned they won’t be able to properly staff a team equipped with the knowledge to build and maintain reliable cloud-based services.

Best Practices for Effective Cloud Monitoring

  • Identify Blind Spots:

    Deeply examine your stack to find areas of weakness or pain points lacking visibility. This is where previously conducted post-incident reviews can help you identify any blind spots across your infrastructure.

  • Key Performance Indicators (KPIs):

    Once you know what you should monitor, now you can identify which metrics will indicate system health. Set up tools that monitor different levels of your service and optimize the KPIs to accurately indicate when an incident occurs. A better understanding of your KPIs will lead to less false alarms for your on-call team at 2 AM.

  • Centralized Visibility:

    Centralizing all monitoring data in one place will improve overall incident detection, response, and team collaboration. This way, you get a more comprehensive view of system health, can correlate incidents more easily, and can easily loop other teammates into problems.

  • Costs:

    Not as much to do with your application’s performance or your system’s health, but it’s important to track what you’re spending on cloud services. Many services charge based on usage, so it’s necessary to be cognizant of the value a service is providing to your team.

  • End User Monitoring:

    A better understanding of how users move through your service and exactly what they’re experiencing can help you build a more intuitive product for customers. Getting data such as page load times or server response speed can help you find pain points in your platform, helping you build more robust systems.

  • Chaos Testing:

    Build with failure in mind. Plan for issues with failover measures and backup plans. Try testing your tools to see what happens when there’s an outage or error, then iterate on the process to improve it.

  • Optimize Alerting:

    Based on the centralized data and the knowledge you learn from the steps above, tweak your alerting thresholds and make sure alerts are actionable and relevant.

Creating a Culture of Reliability

Helpful Tools for Cloud Monitoring

Now that you know what to do when monitoring your cloud services, you also need to know exactly how you do this. So, we wanted to cover a few common monitoring tools that are frequently used for cloud-based services.

Splunk: Now we may be biased, but Splunk’s cloud monitoring solution gives you visibility into cloud-based infrastructure and offers detailed log analytics and search functionality. By monitoring everything in your stack from application hosting environments to SaaS solutions, you’ll paint a vivid picture of what’s truly happening in your product.

AppDynamics: As a true APM, AppDynamics focuses on optimizing your cloud application’s performance. With a number of products and services, AppDynamics can help you with end user monitoring, infrastructure visibility, business intelligence, and overall service reliability monitoring.

New Relic: In dynamic, continuously integrated cloud environments, New Relic will help you monitor applications and infrastructure. Whether you’re running a simplistic architecture, or you’re taking advantage of containers, microservices, and serverless functions, New Relic can help with your cloud monitoring needs.

Solarwinds: SolarWinds Cloud gives you a centralized picture of your cloud infrastructure, applications, and overall digital experience. Identify weak points in your system and build a better integrated, more robust cloud-based solution with SolarWinds.

Amazon CloudWatch: When you’re operating on AWS, Amazon CloudWatch is a go-to monitoring solution. Purpose-built for monitoring cloud-based applications on AWS, you can cover all your bases for infrastructure, platform, and application monitoring.


Whether you’re completely on-premise, operating a hybrid model, or completely cloud-based, proper monitoring is required. A better understanding of your system’s blind spots and weaknesses helps you know how to best monitor your service’s health and the tools you’ll need. So, conduct detailed post-incident reviews, plan for failure, test your monitoring tools and applications, and continuously improve your process to help add visibility and reliability to the products you build.

VictorOps is purpose-built to centralize monitoring data, on-call scheduling, alert routing, and incident response. Sign up for a 14-day free trial to see how we integrate with some of your favorite monitoring tools to make on-call suck less.

Let us help you make on-call suck less.

Get Started Now