You already know about why we built our new reporting feature. It’s now time to explain exactly what you’re seeing in the report and how we got those numbers.
User Metrics reporting is a powerful tool for looking at process, dealing with alert fatigue and optimizing your team’s performance. You’ve asked for it and here it is. The user report: an aggregation of different metrics for specific users (by username).
Lets start our technical deep dive with the first column: Incidents. Incidents is an agglomeration of incidents a user was paged for and acknowledgements that user made without duplicate incidents included. For those who like mathy-looking explanations:
Where C() is a function that counts the elements of a set; P = Set of incidents the user was paged for; A = Set of incidents the user acknowledged
Incidents is not particularly enlightening as it is a base summation of incidents in a system (per user). Depending on the rotation schedule, some people could have more or less than others, making this field more useful when contextualized with knowledge from your specific organization.
Next we move right to the two columns which are based on Acks. Acknowledgements is the number of acks credited to that user (which also happens to be a subset of Incidents) and MTTA is the mean time to acknowledge for that user. The number of acks is simple to calculate, it is the count of incidents the user manually acked. MTTA’s equation is:
Where I = Final Acked Incident; T = Total number of Acked Incidents; Ai =Acked Time of incident i, Si = Start time of incident i
These two metrics provide a view of the workload that can be placed into context with the system as a whole to identify where noisy alerts are taking place, or where workload is not balanced on teams.
The final two columns are the users’ credited recoveries and the mean time to resolve. Number of resolves is calculated by the number of manual resolves plus the number of automatic resolves for which that user acked the incident. The MTTR is calculated from that list of resolves taken as time from first notification to the time the recovery comes in. The equation for MTTR is :
Where I = Final Resolved Incident; T = Total number of Resolved Incidents; Ri = Resolved Time of incident i; Si = Start time of incident i
MTTR and Recoveries are useful in seeing noise in a system as well as your teams’ speed at resolving issues as they arise. It is also very useful in seeing where productivity is being lost. However, all Recoveries and MTTR are contextually bound and therefore interpretation is left to the end user.
What do these metrics mean together? Unfortunately, the best answer is “It Depends”. What is your specific system setup? How do you run your rotations? Are all incidents actionable?
These contextual questions illustrate a small subset of information that can change the interpretation of these metrics. However, one of the most important aspects of the User Metrics report is how busy the users are putting out fires instead of doing other work. A user who is paged incessantly, isn’t going to be productive on nonfirefighting issues. If those pages are for incidents that are resolved automatically, that is a place where noise reduction could improve productivity and make for happier (and less stressed) employees. This report can help make those pain points more visible.