VictorOps is now Splunk On-Call! Learn More.
Typically, alerting tools and incident management practices have been reserved for after code has been pushed to production. But, as far as automation goes, alerting is beneficial everywhere. And, one place where alerting can help is with quality engineering.
Managing the quality of applications before they’re pushed to production encompasses several activities: unit testing, functional testing, and vulnerability scanning. Organizations place emphasis on different stages as it relates to these practices; unit testing being the most common and nearly ubiquitous.
But, many organizations are also augmenting their manual QA testing with powerful automated functional tests. They’re also realizing the risk of third-party packages and implementing automated vulnerability scanning to make sure all artifacts are current and void of known vulnerabilities.
As long as an organization has committed to automating their application delivery processes, there’s an opportunity to make any challenges that surface in that automation actionable with alerting.
Developers, quality engineers, and DevOps teams can be more responsive to issues by adding alerting to the quality tests they’re running and to the infrastructure that the tests run on.
Particularly in the world of functional testing, if a test crashes due to some technical issue with the infrastructure or due to the test itself, the impact can be tremendous. It could mean that entire test suites need to be re-run, delaying deployment. Getting visibility into that failure directly impacts the team’s ability to address it. Here’s where alerting directly benefits quality processes:
The first and most obvious place where alerting benefits quality is in the infrastructure where tests are run. While there should be parity between this infrastructure and production, they’re distinctly separate. They could be on-premise, even for cloud applications. This infrastructure needs to be managed just as production is. If there are issues related to this infrastructure, it directly impacts the ability to run tests, get to deployment and even the test results themselves.
If release automation infrastructure or release management tools fail, no tests are run, and the process grinds to a halt. It’s better to be pushed an alert about these failures than manually finding out about them too late.
Functional test suites in many organizations are large and slow. When a Selenium or Cypress script crashes, the chances are high that all tests need to be re-run. The longer it takes to become aware of such an issue, the later it will be addressed and reset. This is essentially the same impact as all the tests failing – it means delayed releases.
Classically, quality engineering teams are not actively involved in monitoring and alerting for test environments. They run the tests in the evening, cross their fingers and hope when they arrive in the morning that they don’t see any exceptions or crashed machines.
While unit testing runs faster, and is often done on a developer machine, organizations re-run tests from the entire team on integration environments. This is a larger suite of tests.
There are many opportunities to implement alerting in cases where organizations leverage resources that are external of the actual codebase, such as API testing (contracts), mock environments and service virtualization. Getting alerted on the uptime and infrastructure issues of these resources is helpful in deciphering whether or not a failed test is related to your codebase.
When it comes to production, the quality engineering team usually has little input on what alerting is set up. But quality teams can benefit tremendously from insight into production details, issues and context. They can use this data to mitigate known issues by building similar alerts into testing environments, or by identifying snowflake issues in production and including tests from them in test suites.
All of the above testing is executed in production for organizations that have embraced continuous testing. For continuous testing to be effective, the quality team needs to collaborate with the DevOps, Ops and Site Reliability Engineering (SRE) teams to make sure the existing alerting system covers the infrastructure and testing as well.
If you’re using automated workflows, alerting is beneficial allows teams to take action toward remediating issues in that automation. When the application quality team’s job is to make sure that applications don’t get deployed with issues, then it means the infrastructure and tools they use to run the tests need to be fully functional and stable. Well-built monitoring and alerting helps DevOps and IT teams know exactly when and where things go wrong.
Learn more about improving the quality and resilience of your applications through holistic alerting, monitoring and incident response. Check out our free guide, 6 Ways to Transform Your Monitoring and Incident Response to start making the most of your monitoring and alerting toolchain.
Chris Riley (@HoardingInfo) is a technologist who has spent 15 years helping organizations transition from traditional development practices to a modern set of culture, processes and tooling. In addition to being an industry analyst, he is a regular author, speaker and evangelist in the areas of DevOps, big data, and IT. Chris believes the biggest challenges faced in the tech market are not tools but rather people and planning.