Request For Comment: Asynchronous Logging

Tue Mar 30 18:18:33 UTC 2021

Hi,

I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
We are successfully using this feature productively at Amazon both in
jdk8 and jdk11 to reduce the tail latency of services which use
logging. We think that async logging is a useful addition to the
current logging framework which might be beneficial to a wider range
of users. The following write-up tries to capture the comments and
suggestions from the previous discussions we are aware of.

Current state:

- HotSpot uses the so called "Unified Logging" (UL) framework which
was introduced by JEP 158 [5] in JDK 9. Most logs have been
retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
[6]).
- The current UL implementation is based on the standard C buffered
stream I/O interface [7]. The LogFileStreamOutput class which writes
logs to abstract FILE streams is the only child of the abstract base
class LogOutput. LogFileStreamOutput has three child classes
LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
stdout, stderr or an arbitrary file respectively. The initial UL JEP
158 [5] envisioned logging to a socket but has not implemented it. At
least one such extension has been proposed since then [8].
- UL synchronizes logs from different threads with the help of the
standard C flockfile()/funlockfile() [9] functions. But even without
this explicit locking, all the "stdio functions are thread-safe. This
is achieved by assigning to each FILE object a lockcount and (if the
lockcount is nonzero) an owning thread.  For each library call, these
functions wait until the FILE object is no longer locked by a
different thread, then lock it, do the requested I/O, and unlock the
object again" [9]. A quick look at the glibc sources reveals that FILE
locking is implemented with the help of futex() [10] which breaks down
to s simple atomic compare and swap (CAS) on the fast path.
- Notice that UL "synchronizes" logs from different threads to avoid
log interleaving. But it does not "serialize" logs according to the
time at which  they occurred. This is because the underlying stdio
functions do not guarantee a specific order for different threads
waiting on a locked FILE stream. E.g. if three log events A, B, C
occur in that order, the first will lock the output stream. If the log
events B and C both arrive while the stream is locked, it is
unspecified which of B and C will be logged first after A releases the
lock.

Problem statement:

- The amount of time a log event will block its FILE stream depends on
the underlying file system. This can range from a few nanoseconds for
in-memory file systems or milliseconds for physical discs under heavy
load up to several seconds in the worst case scenario for e.g. network
file systems. A blocked log output stream will block all concurrent
threads which try to write log messages at the same time. If logging
is done during a safepoint, this can significantly increase the
safepoint time (e.g. several parallel GC threads trying to log at the
same time). We can treat stdout/stderr as special files here without
loss of generality.

Proposed solution:

Extend UL with an asynchronous logging facility. Asynchronous logging
will be optional and disabled by default. It should have the following
properties:
- If disabled (the default) asynchronous logging should have no
observable impact on logging.
- If enabled, log messages will be stored in an intermediate data
structure (e.g. a double ended queue).
- A service thread will concurrently read and remove log messages from
that data structure in a FIFO style and write them to the output
stream
- Storing log messages in the intermediate data structure should take
constant time and not longer than logging a message takes in the
traditional UL system (in general the time should be much shorter
because the actual I/O is deferred).
- Asynchronous logging trades memory overhead (i.e. the size of the
intermediate data structure) for log accuracy. This means that in the
unlikely case where the service thread which does the asynchronous
logging falls behind the log producing threads, some logs might be
lost. However, the probability for this to happen can be minimized by
increasing the configurable size of the intermediate data structure.
- The final output produced by asynchronous logging should be equal to
the output from normal logging if no messages had to be dropped.
Notice that in contrast to the traditional unified logging,
asynchronous logging will give us the possibility to not only
synchronize log events, but to optionally also serialize them based on
their generation time if that's desired. This is because we are in
full control of the synchronization primitives for the intermediate
data structure which stores the logs.
- If log messages had to be dropped, this should be logged in the log
output (e.g. "[..] 42 messages dropped due to async logging")
- Asynchronous logging should ideally be implemented in such a way
that it can be easily adapted by alternative log targets like for
example sockets in the future.

Alternative solutions:
- It has repeatedly been suggested to place the log files into a
memory file system but we don't think this is an optimal solution.
Main memory is often a constrained resource and we don't want log
files to compete with the JVM for it in such cases.
- It has also been argued to place the log files on a fast file system
which is only used for logging but in containerized environments file
system are often virtualized and the properties of the underlying
physical devices are not obvious.
- The load on the file system might be unpredictable due to other
applications on the same host.
- All these workarounds won't work if we eventually implement direct
logging to a network socket as suggested in [8].

Implementation details / POC:

- A recent pull request [2] for JDK-8229517 [3] proposed to use a
simple deque implementation derived from HotSpot's LinkedListImpl
class for the intermediate data structure. It synchronizes access to
the queue with a MutexLocker which is internally implemented with
pthread_lock() and results in an atomic CAS on the fast path. So
performance-wise the locking itself is not different from the
flockfile()/funlockfile() functionality currently used by UL but
adding a log message to the deque should be constant as it basically
only requires a strdup(). And we could even eliminate the strdup() if
we'd pre-allocate a big enough array for holding the log messages as
proposed in the pull request [2].
- The pull pull request [2] for JDK-8229517 [3] proposed to set the
async flag as an attribute of the Xlog option which feels more natural
because UL configuration is traditionally done within the Xlog option.
But we could just as well use a global -XX flag to enable async
logging? What are your preferences here?
- The pull pull request [2] for JDK-8229517 [3] (mis)uses the
WatcherThread as service thread to concurrently process the
intermediate data structure and write the log messages out to the log
stream. That should definitely be changed to an independent service
thread.
- The pull pull request [2] for JDK-8229517 [3] initially proposed
that the "service thread" runs at a fixed interval to dump log
messages to the log streams. But reviewers commented that this should
better happen either continuously or based on the filling level of the
intermediate data structure. What are your preferences here?
- What are your preferences on the configuration of the intermediate
data structure? Should it be configured based on the maximum number of
log messages it can store or rather on the total size of the stored
log messages? I think that for normal runs this distinction probably
won't make a big difference because the size of log messages will
probably be the same on average so "number of log messages" should
always be proportional to "total size of log mesages".

1. Before diving into more implementation details, I'd first like to
reach a general consensus that asynchronous logging is a useful
feature that's worth while adding to HotSpot.

2. Once we agree on that, we should agree on the desired properties of
asynchronous logging. I've tried to collect a basic set in the
"Proposed solution" section.

3. If that's done as well, we can discuss various implementation
details and finally prepare new pull requests.

Thank you and best regards,
Volker

[1] https://bugs.openjdk.java.net/browse/JDK-8229517
[2] https://github.com/openjdk/jdk/pull/3135
[3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
[4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
[5] https://openjdk.java.net/jeps/158
[6] https://openjdk.java.net/jeps/271
[7] https://man7.org/linux/man-pages/man3/stdio.3.html
[8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
[9] https://man7.org/linux/man-pages/man3/flockfile.3.html
[10] https://man7.org/linux/man-pages/man2/futex.2.html