VictorOps is now Splunk On-Call! Learn More.
Collaborative help desks and service desks are essential to both IT and customer support. Together, they give teams a way to respond to internal and external incidents and work cross-functionally to support reliable services for end-users. Whether incidents are detected via monitoring tools or through technical support help desks, the business needs a cohesive incident management plan to maintain uptime and keep customers happy.
Less downtime plus happier customers equal more revenue. Engineering managers are always hyper-focused on ensuring positive customer experiences and driving business value. In modern software development and IT operations, rapid delivery of resilient services will act as a competitive differentiator between businesses. So, it’s more important than ever to build holistic incident management and response plans – connecting customer support help desks to developer escalation workflows.
So, we wrote this article to show how help desks and service desks can work across the rest of engineering, DevOps and IT to build a cohesive incident management plan.
A lot of organizations use the terms “help desk” and “service desk” interchangeably. But, there are a few small differences between the two. You may use one software solution as both a help desk and a service desk. But, help desks are typically focused on end-users of your product whereas service desks are focused on internal business issues. In fact, one integrated tool for both help desk issues and service desk issues can help you reroute problems faster to the proper team or person. Also, you can easily communicate across departmental boundaries when the whole organization is working in one place.
But, what happens when neither customer support agents or IT service desk professionals can fix the incident at hand? What if the issue is a larger incident with your underlying application or service? How does the help desk or service desk escalate incidents to DevOps, infrastructure or sysadmin teams? Do the developers or IT operations teams have an incentive to quickly respond to the issue or do they continue focusing on the delivery pipeline?
A cohesive incident management plan doesn’t only concern help desk and service desk employees. Developers and IT teams also need to establish a culture focused on DevOps collaboration and transparency – sharing accountability for the services they build and maintain. Let’s take a deeper dive into the way help desks can (and should) be integrated into the greater incident management lifecycle.
The incident management lifecycle is made up of five stages:
First and foremost, DevOps and IT teams need to set up monitoring and alerting processes to detect incidents quickly. Rapid detection of incidents can allow teams to surface alert context faster and identify the best method for incident response. Taking multiple, related alerts and finding ways to quickly roll them up into one incident can also drastically improve the detection of an incident’s root cause.
Once the teams have detected the incident, it needs to be acted upon. Engineering, IT and customer support teams swarm the issue and work toward a resolution. Figuring out who should be working on an issue and the tools they need to start fixing the problem should be top of mind when building an incident response plan.
With a detailed incident detection and response plan in place, remediation becomes much simpler. It can be as simple as following instructions from a runbook or rolling back a deployment. When detection and response are optimized for getting the right people involved, the team should know how to fix the issue faster when the alert first comes through. This way, DevOps and IT teams can maintain more available services and allow customer help desks to spend less time easing customer anxiety.
After the incident has been resolved, everyone can come together to analyze the incident and learn from what happened. Is there anything the help desk could have done to get developers involved more quickly? Could technical support agents fix the issue themselves if given the right runbooks or context? The post-incident review process should include anyone involved in the first three stages of the incident lifecycle and allow teammates from support, IT and software development to voice their opinions.
After taking actionable lessons from past incidents, the team can prepare themselves for future problems. Over time, the help desk, IT operations and software development teams learn from how they work and create more cohesive incident management workflows.
So, what are some best practices for escalating issues to developers and more specialized IT operations? Well, automation in on-call alerting and escalation policies can definitely make this easier. While help desks are often the first line of defense when customers experience issues, developers, database administrators and sysadmins need to also take on-call responsibilities. Help desk agents shouldn’t be stranded at 3 AM with an issue they can’t solve and nobody to help them.
By establishing standardized on-call rotations and escalation policies for DevOps and IT teams, help desks can easily reroute alerts and incidents to the people and teams who can actually solve them. As companies scale, standardization in incident management becomes especially important because help desk agents can simply send alerts to an escalation policy and know it will get taken care of. Incident automation, standardization of escalations and on-call schedules allow help desks and service desks to get alerts to the right person every time. New teammates don’t need to know the exact person they should send an alert to, they only need to know the escalation policy.
Developers and IT operations teams who take ownership for their services and assist their help desk agents drive positive customer experiences and a better internal culture of collaboration.
DevOps principles don’t stop with IT operations and software engineering teams. A culture of DevOps collaboration and transparency should persist throughout all of the business teams as well. In DevOps, help desks and service desks don’t get inundated with tickets for issues they can’t solve. The entire organization continuously improves and takes accountability for their products and services. Everyone works together cohesively to build proactive incident management plans and deliver greater customer value faster.
See how VictorOps can help you automate and organize on-call schedules, intelligent alerting and escalation policies in a highly collaborative environment. Try it yourself in a 14-day free trial or request a free personalized demo from our sales team to start making on-call suck less.