Hootsuite sends millions of messages to social networks every day. This is our bread and butter, so any unexpected errors sending these messages requires that we quickly detect, diagnose and fix the problem. For example, was it because of a recently deployed code change, or a change in behaviour on the social network? Logs are an invaluable tool for debugging issues like this in production, but without easily searchable information about the context of the error, we could find ourselves looking in the wrong direction.
If an error occurs when posting a message to Twitter, we would like to know information like:
- The user who scheduled the message
- The twitter account id
- The error code(s)
- The server the request was processed on
While all this information can be propagated directly via method arguments and manually included in each log message, it tends to clutter APIs for the sake of information that is only used for logs.
https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=LoggingExample.scalaIt also makes the logs more difficult to use in analysis tools like Kibana, which work much better with structured metadata:
Thankfully, libraries like SLF4J and Logback provide a Mapped Diagnostic Context (MDC) that can be used to set contextual information about a single execution trace that is included with every message logged in that trace. Unfortunately, the MDC is implemented using a
ThreadLocal, which does not work in an environment where everything is being processed asynchronously in different
ExecutionContexts. For example, messages sent to Akka actors and execution of
Futures are almost always processed on shared pools of worker threads. All is not lost, however. There is a mechanism that allows us to retain thread-local-like behavior across asynchronous calls.
Logging in a Synchronous Application
In a synchronous application, using the MDC is simple. At any point new contextual information is available, it can be added to the MDC and seen everywhere from that point forward. For example, in a simple Play controller:https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=SyncController.scala
Example log output (the values in the braces are the MDC values and the log message is shown directly after):
Logging in an Asynchronous Application
If we change the controller to perform operations asynchronously, however, the following happens:https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=AsyncController.diff
Example log output:
Oops! The MDC only contains values on the thread that the values were set on.
Propagating Thread-Local Values Through Future Execution
The solution is to manually propagate the
ThreadLocal MDC values from the asynchronous call point to the execution point. There is a way to do this by overriding the
prepare method of
ExecutionContext, which is invoked just before something is submitted to be run asynchronously.https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=PrepareScaladoc.scala
The code is remarkably simple:
https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=MDCPropagatingExecutionContext.scalaLet’s change our example that did not work, this time wrapping the
ExecutionContext with one that propagates the MDC.
Example log output:
Propagating Thread-Local Values Through Actor Calls
This is not as simple a solution. Akka actors can only be sent messages, and there is not a “side channel” (i.e something akin to HTTP headers) where contextual information can be passed alongside the main message. It is possible, however, to create a base Actor trait that is aware of contextual information and then send that information in the message. It’s not transparent like the solution for
ExecutionContexts and is somewhat hacky. We’re still experimenting with this method and are trying to find the most unobtrusive and elegant way of passing contextual information between actors.https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=MDCContextAware.scala
Bonus: Generalizing ThreadLocal Propagation
This works great for the MDC, but what if there is other contextual information that we want to pass around (e.g. trace ids for a distributed tracing system)? Re-implementing a new type of
ExecutionContext/Actor for each type would be a pain, and error-prone (what if you forget to wrap a new ExecutionContext with the new
The concept of propagated
ThreadLocals can be generalized. We call them
ContextLocals (Twitter has a similar concept called Local in their util library). In essence,
ContextLocal values are (ideally immutable) values all stored in a single shared
ThreadLocal, and all
ContextLocal values are propagated at once with a single
ExecutionContext wrapper. We’ve also created a mechanism for a
ContextLocal to proxy an existing
ThreadLocal API (like the MDC). Our
ContextLocal API looks something like this:https://gist.github.com/jriecken/c3f30c1c139b82b11bbd.js?file=ContextLocalApi.scala
ContextLocals and ensuring all of our
ExecutionContexts use a wrapper that propagates the values, we can be confident that all the necessary information is available whether for logs, tracing, or any other cross-cutting concerns that come down the line.
Logging with structured, contextual data can make debugging issues in production much easier than just using flat log messages. Concepts like the Mapped Diagnostic Context help with collecting contextual data, but do not work out of the box in an asynchronous environment. However, as we’ve seen, it’s possible to easily (and fairly transparently) propagate the information using an ExecutionContext wrapper.
Note: If you’re using Logback, be aware that it uses an
InheritableThreadLocal for the MDC. This can result in strange behavior if a new thread is created in a thread pool while there is information in the MDC of the current thread. The current values in the MDC will end up being the default values in the new thread, which is unexpected. There is an open issue on Logback’s JIRA that suggests (and we agree) that the MDC should not be an
Thanks to Beier Cai, Johnny Bufu, Kimli Welsh, Luke Kysow, and Mike White for reading and editing early versions of this post.
About the Author
Jim is a Senior Engineer on Hootsuite’s Platform team. When he’s not working on building a rapidly expanding collection of Scala/Akka micro services, he likes cycling and writing/playing music. Follow him on Twitter @jimriecken.