The paradox of debugging with static logs: Get dynamic or go home

Debugging with logs is a common practice. But as applications get more complex and more distributed, they also get more difficult to understand. And when something goes wrong (don't kid yourself; that’s "when," not "if"), they're even more difficult to fix.

So, what do you do? Add logs, logs, and more logs. Logs make us comfortable. Logs give us observability. They're supposed to help us understand the inner workings of those complex systems we all use every day. But they come at a price.

Excessive logging can affect system performance—and add cost to store petabytes of log files. Companies are constantly striking a balance between adding logs and keeping systems running at peak performance.

There are best practices when using static logs for debugging, but there is a better way: dynamic logging.

The traditional approach

Here's how it works with static logs. Dave, the developer, is working on the latest cool feature and thinks to himself, "Hmm, this part's kind of complicated. Let's put out a log entry with some data in case something goes wrong."

Usually, Dave will use one of the common open-source logging frameworks such as Serilog or Log4Net, and will write something like this to output his log entry:

Logger.LogInformation(“About to invoke transaction {id} on table {table}”,transactions.Id, table.Name)For each line like this, Dave has a few decisions to make:

What should the severity level be (verbose, debug, information, warning, error, or fatal)?
Where exactly should that line go? At the beginning of the method? At the end? In the "catch" of a "try/catch" block?
Exactly what information should I log?

Three best practices for debugging with logs

None of those questions has an easy or immediately obvious answer. The severity level will affect the volume of logs and ease or difficulty of finding the relevant entries when the time comes. The value of data items can change during the course of executing a function.

For instance, at what point exactly do you really need to know the value? Hard to say. What data item will you need at crunch time? Also hard to say.

So here are three best practices you can use when faced with these questions.

Include contextual data

Applications generate millions of log entries, and every entry includes a message and time stamp. But if that is all you have to work with, finding the log entries relevant to a specific incident is extremely difficult.

Adding contextual data to log entries such as "service name" or "machine name" provides developers with strong tools to filter the log messages and zoom in on the right ones.

Structure your log entries

A log message comprises text and value elements concatenated together. To filter on a log entry with any given format, you need to parse it and tokenize its elements.

Modern log analyzers associate each log entry with its template and tokens in "raw" format. Using a consistent structure for your log entries allows a quick and easy search for logs that include any given token.

Include transaction and correlation IDs

Debugging with logs is an exercise in tracking the execution flow of your code through the logs it generates. However, when your code runs through multiple services and machines, the log entries can be distributed across different machines and log files, making it hard to track how your code executed step by step.

Attaching transaction and correlation IDs to your log entries allows log analyzers to piece together the log entries from several sources and show you how things unfolded across a specific transaction. Advanced systems also use span IDs to create a tree structure of all sub-operations within a transaction.

The goal of these practices is to increase the searchability of your logs so you can track the flow of your code across multiple services, machines, and modules. Nevertheless, while they do help you be more productive, they don't really answer those three fundamental questions posed earlier by Dave, the developer.

Static logs: The paradox

To have the most useful logs, you need to answer the questions of where to log, what to log, and what severity to give the log. The way you answer those questions each time can have a huge impact on your system and processes. If you didn't log enough information, you'll have to add more logs later when you come to debug a specific error.

Log too much information, and your system performance can take a hit, storage requirements may go up, and finding just the right log can be the proverbial needle in the haystack. The problem is that the answers to those questions change depending on the issue you're trying to solve.

Let's take a simple example. If you're having trouble shipping goods to customers, you may want to log their address, but if customers are having trouble checking out the shopping cart, you may need to investigate their credit rating.

So, log both the address and the credit rating? You can, but applying that kind of logic will quickly result in bloated logs and performance hits, which brings us to the paradox of static logs:

If you know what to log, you've already solved the bug.

In practice, you rarely have the right logs in your code. Consequently, finding the root cause of a problem with logs requires several iterations of adding more logs to gather information and going through your CI/CD cycle each time to test your theories.

Each iteration like this can take anywhere from hours to days. And even if you're willing to suffer through this tiresome process, debugging with static logs has some other built-in problems.

Pitfalls of static logging

Beyond those three vital questions (where, what, and severity), static logging poses additional challenges. These include:

Data privacy: The production systems you're trying to debug may contain personally identifiable information (PII) that your developers are not even allowed to see. You may not be able to log what you want to. Protecting privacy requires training and sound methodologies to get it right.
Logging data structures: Logging complex data structures such as trees, lists, or enumerators can have a significant memory impact. You need to carefully consider what's happening behind the scenes—think memory footprint and serializing to a file system—when you log one of those structures.
Deadlocks and side effects: Adding logs changes your code, sometimes in unexpected ways. For example, doing a "get-value" when serializing a field can mutate a class and change the flow of your program or even cause deadlocks if the class has been designed with thread safety. While these effects might not be extremely common, they do happen and are hard to uncover.
Log freshness: Your code is alive and constantly changing. Log entries created five builds ago may not be relevant anymore or, worse, may be misleading. For example, a log entry connected to the logic of a function may have to change if the logic changes. You can get around this problem by correlating a new log entry with the version (the git commit) in which it was introduced; remember the best practice of using contextual data.
Exceptions: Logging exceptions can be tricky. The state of variables you write into your logs may change by the time you catch the exception, and in any case, you can't be sure you’ve caught (and logged) every exception.

So, what else can you do apart from using static logs?

Dynamic logs: How they work

Dynamic logs are logs you add to your application while it's live in production without having to modify your source code and run through a CI/CD cycle. This capability is provided by an agent that is installed alongside your application. The agent communicates with a server that tells it which logs to add and where to add them.

The agent then adds the required code using byte-code instrumentation. When your application executes that code, the corresponding log entry is sent back to the server that can either display it to a client application or pipe it to another tool such as an advanced log analyzer.

Byte-code instrumentation is not new. In fact, it has been in use for years and is the fundamental technique enabling the observability provided by application performance monitoring (APM) tools.

Dynamic logging takes it to the next level, letting you output log entries anywhere they're needed. You can add logs to investigate an issue without touching the source code. As soon as you're done investigating, the instrumentation is removed.

The simplicity of debugging with dynamic logs coupled with its power has a real impact on how you track, monitor, and debug your live systems.

Advantages of using dynamic logs

Here's how dynamic logging avoids the pitfalls of debugging with static logs.

Data privacy: The agent can be configured in different ways to redact PII. Any data configured for protection is automatically hidden in dynamic log output. This removes the burden from developers, who no longer have to worry about the privacy of data they commit to logs.
Data structures and memory pressure: Dynamic logs don't run the risk of overloading your CPU or memory usage because the agent can apply mechanisms to cap its impact on performance. (The agent has added code through byte-code instrumentation. Code needs CPU and memory to perform its function—which is logging in this case. But an agent can restrict its activity according to how it affects application performance. - So the agent can put a cap on its impact on the application's performance.)
Deadlocks and side effects: The agent can avoid deadlocks and side effects by capturing only those fields and auto-properties that don't mutate the object they are in.
Log freshness: Dynamic logs are added wherever and whenever needed, on the fly, to the currently running code. They're like picking fruit straight from the tree. You can't get any fresher than that.
Conditional logging: Adding complex logic just to decide whether to record a log can be risky and is avoided in static logging. But dynamic logs run no risk of altering the application flow. You can implement any conditional logic to record them and update those conditions at any time.

Developers observe or analyze only about 0.01% of static logs. They use dynamic logs much more intensely because they are created at exactly the time and place the developer wants to see them.

Static logs affect performance and memory, even if logging is disabled. The static log function is called, arguments are loaded onto the stack, the function checks the state and then simply returns, and the stack must unwind. Dynamic logs may have a simiar affect on CPU and memory when active, but there's always the agent that monitors these affects in the background and caps the logging activity if the performance impact goes above a certain configurable percentage. When dynamic logging is disabled, however, the byte code instrumentation is removed, so it has no impact on performance and memory.

Try debugging with dynamic logs

Static logs are so ingrained in software that a whole industry has grown around collecting them, parsing the information they contain, and displaying it in different ways.

In the end, it's all about debugging. Dynamic logging is relatively new and avoids most of the hassle of static logs. Add them when and where you need them, and you don't have to worry about which data to capture; you can capture full state.

Debugging becomes much faster, enabling a more effective root-cause analysis and reducing mean time to repair. There's no need to create thousands of log entries to search through, when you can create just the right one when and where you need it.

Dynamic logging is the way to go—you get more than static logs at a fraction of the cost to memory, performance, and storage.

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, App Dev

You are here