Passionate Software Engineering: An effective logging model

Logging is a really trivial thing. At least it looks so. However, the trivial thing is just printing out a log line. What to print out and which extra information should that log contain is not that straightforward. We all have seen and used logging model where log statements have been classified into several categories. For example, the Java logging utility defines 7 different categories. Another common classification seems to be:

Fatal: something extremely bad and unrecoverable happened causing the system to stop or exit;
Error: something bad and unrecoverable happened but not so bad to halt the system;
Warning: something bad happened but it has been recovered;
Info: some important event has been received or stage reached;
Debug: used to print out any sort of internal data to help debugging.

Along with this classification model there also exist logging frameworks capable of fancy things like:

Turning on and off each log level dynamically at run-time;
Redirecting log statements to separate media (e.g. files, sockets, system's log, etc.) based on the log level or the class or file the log has been generated from.

All this sounds useful, but too often the outcome is an extremely noisy and untrustworthy log file containing too many stuff and at times too many errors, even if the system was just behaving itself. Furthermore, the application is unusable because 90% of the time it's doing a printer's job instead of its. This is why all that fancy features exist; to try to filter all this noise and restore some performance back.

Logging is a really trivial thing and in my projects I keep it that way by using a really trivial logging model:

Error: an error internal to the program occurred but it shouldn't have; this means that it's unrecoverable. For example we placed a try-catch block at some very high level and it caught some sort of exception. On the opposite, if we were able to write code to recover from an error then it wouldn't be an error, because it's something we can handle and dealing with it is part of the program's behavior.
Warning: an error external to the program occurred. Being outside of our control is something we knew it could occur and so we wrote defensive code. For example, the server received a malformed message from a client.
Info: any other event which is meaningful in the application's domain. For example an embedded device discovered that a new software version is available. On the opposite, printing out a temporary variable's value is probably not meaningful in the application's domain.

There's no need for lots of logging level. For example, there's no need for debug logs which print out variables and other internal data just because "it could help". We should rather write unit tests. If they pass, then we know exactly how the program behave.

There's no need for fancy stuff like turning logging on and off at run time, because logging is less than 10% of the job, not the other way round.

There's no reason to remove logging when releasing the product, for the same reason. The only context where it might still be reasonable to strip logging off is embedded devices with no communications at all, but this sounds an extremely rare case; there's always a way to inspect/extract logs from a device.

In conclusion, logging is a trivial thing. We should keep it that way without adding fancy features of any sort. However, what is worth to be logged and which level it belongs to is more an art than a science.

Passionate Software Engineering

Monday 9 December 2013

An effective logging model

No comments:

Post a Comment