Jonathan Schwietert - July 20, 2017
How We Made Logging Great Again Jonathan is a platform engineer at VictorOps, responsible for system scalability and performance. This is the second part in a series on system visibility, the Detection and Analysis part of the Incident Management Lifecycle. If you missed it, read Part 1 first.
Like any good engineering approach, we wanted a simple, thin facade that would codify our use-cases, provide a single extension point, and be low-overhead to maintain. With the exception of appender performance, we needed unification of the interfaces in order to address all the concerns mentioned in Part 1. We did just that.
We created a top level
Logger that would house all supplemental functionality and become our primary interface regardless of whether we’re logging from a “typical” class, an actor, or a controller. There are two implementations:
ActorLogger which behind the scenes make use of SLF4J and Akka logging, respectively. So far, we haven’t had a use case for a separate controller-oriented logger and simply used the
ClassLogger in our controllers.
Next, we created a
Logging trait in order to further standardize our logger names and break the stigma of a separate
Logger for each class in a class hierarchy, in favor of a single
Logger per instance. In other words, we lose the context of which hierarchical class in favor of the gains it brings. This has the benefit of providing better context to any log statements called from a parent class or trait mixin. This also removed further developer decisions by mixing in a logger with the same field name, logger name, and identity* which we’ll get into next, regarding log variables.
*identity–in this case I’m referring to the information you want printed in every log statement from that class. For example, if the class is associated to a specific user, the username may be a part of its identity.
The goal of codifying use-cases was achieved through adding first class support for log variables — which we define as any key-value pair you want easily parsed from your log statement. This meant the developer was now able to simply pass a tuple (or with macros, just the aptly named variable) to the log method and it was formatted by the facade. No more formatting errors, like this:
Notice the missing closing brace and, in this case, the inconsistency of this statement using colon+space instead of equals. That simple mistake would lead to a failure to extract the variable in your log parsing (in Sumo/Elastic/Splunk/etc) making it somewhat hidden and potentially yield the log statement useless.
Once we had basic support for log variables we got some ideas. For example, how about allowing a class to set class-level log variables that are printed in every log statement from a class? We did just that. Then, we added the ability to simply create a formatted string using the same functionality – useful for strings given to other objects, like a failed
Future. Below, see how we were able to expand the use of class-level log variables as well as string creation into failure messages:
Another perspective on log variables is that they allow you to capture the context of a log statement which helps to describe the data being acted upon. Implicit loggers allow utility objects to assume the identity of the caller when logging relevant information about their job. So, instead of telling you about a needle in a haystack:
We now specify who wanted it (since the class represents the caller) as well as any context they set up in their logger (like the org and user in this example):
A feature worth considering is sanitizing PII (personally identifiable information) in log statements. There are some references to functionality like this in the GoDaddy logger and others. So, focusing on maintaining a thin facade, you could completely replace Logback with the GoDaddy Logger and add this functionality without modifying many, if any, log statements directly.
This simple solution totaled 452 lines of code, including tests and nearly 100 lines of comments to aid in migrating files — the developer responses were surprising to say the least.
Remember the last time you got a splinter in your foot? Such a simple task of removing the splinter relieves that constant annoyance that can easily build up if not addressed. Ha ha! Our state of logging was just like that splinter. Relieved comments began to trickle in, one by one, on how devs have been pleasantly surprised with the new experience of logging on our platform.
Most of those comments centered on how the unified logging was implemented at the right level of abstraction (flexibility versus structure) and that it made logging simple and, because of that, enjoyable to use. These days at VictorOps, developers are progressively migrating the files they touch in order to reap the benefits and the simplicity they’ve become accustomed to.
Here’s a before/after displaying the reuse, codified formatting, and maintainability improvements:
With this change, we progressed in two very important ways as a team: We addressed some tech debt that had become a larger nuisance than we had realized; and, we improved our log searchability by providing simplicity and just enough structure to get the job done.
So, with our logging in a much happier place, we were prepared to move to the next step of improving our monitoring and instrumentation. In Part 3, we’ll talk about how metrics have become our next focal point.