Request For Comment: Asynchronous Logging

Wed Mar 31 19:10:27 UTC 2021

Hi Volker,

Excellent summary. Thank you for starting the design discussion away from
the PR thread at GH. I think this is a better place for this discussion.

I think UL had been missing a feature like this. I think we should provide
it. Different people have come forward with the same idea in the past, so I
believe there is a real need.

You captured all points succinctly and prepared the discussion well. My
remarks are inline (where I don't write one I agree).

On Tue, Mar 30, 2021 at 8:19 PM Volker Simonis <volker.simonis at gmail.com>
wrote:

> Hi,
>
> I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
> We are successfully using this feature productively at Amazon both in
> jdk8 and jdk11 to reduce the tail latency of services which use
> logging. We think that async logging is a useful addition to the
> current logging framework which might be beneficial to a wider range
> of users. The following write-up tries to capture the comments and
> suggestions from the previous discussions we are aware of.
>
> Current state:
>
> - HotSpot uses the so called "Unified Logging" (UL) framework which
> was introduced by JEP 158 [5] in JDK 9. Most logs have been
> retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
> [6]).
> - The current UL implementation is based on the standard C buffered
> stream I/O interface [7]. The LogFileStreamOutput class which writes
> logs to abstract FILE streams is the only child of the abstract base
> class LogOutput. LogFileStreamOutput has three child classes
> LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
> stdout, stderr or an arbitrary file respectively. The initial UL JEP
> 158 [5] envisioned logging to a socket but has not implemented it. At
> least one such extension has been proposed since then [8].
> - UL synchronizes logs from different threads with the help of the
> standard C flockfile()/funlockfile() [9] functions. But even without
> this explicit locking, all the "stdio functions are thread-safe. This
> is achieved by assigning to each FILE object a lockcount and (if the
> lockcount is nonzero) an owning thread.  For each library call, these
> functions wait until the FILE object is no longer locked by a
> different thread, then lock it, do the requested I/O, and unlock the
> object again" [9]. A quick look at the glibc sources reveals that FILE
> locking is implemented with the help of futex() [10] which breaks down
> to s simple atomic compare and swap (CAS) on the fast path.
> - Notice that UL "synchronizes" logs from different threads to avoid
> log interleaving. But it does not "serialize" logs according to the
> time at which  they occurred. This is because the underlying stdio
> functions do not guarantee a specific order for different threads
> waiting on a locked FILE stream. E.g. if three log events A, B, C
> occur in that order, the first will lock the output stream. If the log
> events B and C both arrive while the stream is locked, it is
> unspecified which of B and C will be logged first after A releases the
> lock.
>
> Problem statement:
>
> - The amount of time a log event will block its FILE stream depends on
> the underlying file system. This can range from a few nanoseconds for
> in-memory file systems or milliseconds for physical discs under heavy
> load up to several seconds in the worst case scenario for e.g. network
> file systems. A blocked log output stream will block all concurrent
> threads which try to write log messages at the same time. If logging
> is done during a safepoint, this can significantly increase the
> safepoint time (e.g. several parallel GC threads trying to log at the
> same time). We can treat stdout/stderr as special files here without
> loss of generality.
>
> Proposed solution:
>
> Extend UL with an asynchronous logging facility. Asynchronous logging
> will be optional and disabled by default. It should have the following
> properties:
> - If disabled (the default) asynchronous logging should have no
> observable impact on logging.
>

Additionally, if disabled, it should not cost anything. If enabled, it
should cost as little as possible (eg. if logging is enabled but nobody
logs).

> - If enabled, log messages will be stored in an intermediate data
> structure (e.g. a double ended queue).
> - A service thread will concurrently read and remove log messages from
> that data structure in a FIFO style and write them to the output
> stream
> - Storing log messages in the intermediate data structure should take
> constant time and not longer than logging a message takes in the
> traditional UL system (in general the time should be much shorter
> because the actual I/O is deferred).
> - Asynchronous logging trades memory overhead (i.e. the size of the
> intermediate data structure) for log accuracy. This means that in the
> unlikely case where the service thread which does the asynchronous
> logging falls behind the log producing threads, some logs might be
> lost. However, the probability for this to happen can be minimized by
> increasing the configurable size of the intermediate data structure.
> - The final output produced by asynchronous logging should be equal to
> the output from normal logging if no messages had to be dropped.
>

+1. This means decorators have to be resolved at the log site, not in the
flusher.

> Notice that in contrast to the traditional unified logging,
> asynchronous logging will give us the possibility to not only
> synchronize log events, but to optionally also serialize them based on
> their generation time if that's desired. This is because we are in
> full control of the synchronization primitives for the intermediate
> data structure which stores the logs.

- If log messages had to be dropped, this should be logged in the log
> output (e.g. "[..] 42 messages dropped due to async logging")
> - Asynchronous logging should ideally be implemented in such a way
> that it can be easily adapted by alternative log targets like for
> example sockets in the future.
>

Additional requests:
- no log output should be withheld in case of vm exit or crash
- no log output should be unreasonably delayed
- The logger side should use as little VM infrastructure as possible to
prevent circularity.

>
> Alternative solutions:
> - It has repeatedly been suggested to place the log files into a
> memory file system but we don't think this is an optimal solution.
> Main memory is often a constrained resource and we don't want log
> files to compete with the JVM for it in such cases.
> - It has also been argued to place the log files on a fast file system
> which is only used for logging but in containerized environments file
> system are often virtualized and the properties of the underlying
> physical devices are not obvious.
> - The load on the file system might be unpredictable due to other
> applications on the same host.
> - All these workarounds won't work if we eventually implement direct
> logging to a network socket as suggested in [8].
>
> Implementation details / POC:
>
> - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> simple deque implementation derived from HotSpot's LinkedListImpl
> class for the intermediate data structure. It synchronizes access to
> the queue with a MutexLocker which is internally implemented with
> pthread_lock() and results in an atomic CAS on the fast path. So
> performance-wise the locking itself is not different from the
> flockfile()/funlockfile() functionality currently used by UL but
> adding a log message to the deque should be constant as it basically
> only requires a strdup(). And we could even eliminate the strdup() if
> we'd pre-allocate a big enough array for holding the log messages as
> proposed in the pull request [2].
>

IIUC to be equivalent to flockfile the implementation would have to use one
queue and one mutex *per file sink*, not a global queue/mutex as the patch
proposed. Because otherwise you now introduce synchronization between log
sites logging into different files, which before did not affect each other.

> - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> async flag as an attribute of the Xlog option which feels more natural
> because UL configuration is traditionally done within the Xlog option.
> But we could just as well use a global -XX flag to enable async
> logging? What are your preferences here?
>

I prefer to keep "async" a global option. I think we should expose as
little freedom to the user as possible. I do not think there is a sensible
scenario where one would wish to write to one file with async, to another
file without async. Nevertheless, if we make this option target-specific
this has to work, and perform, and be regression-tested in all its
variations. Every option we roll out is a contract we have to fulfill. They
find their way into all kinds of environments and user scripts, and once
out there it is difficult to roll out incompatible changes. For instance,
there is no mechanism to deprecate a part of an option. We have a mechanism
for deprecating normal VM options, but Xlog is not a  normal option.

I am concerned with keeping UL maintainable, and that means keeping the
implementation malleable. The more implementation details we expose in the
form of options and functionality, the more our hands are tied if we want
to change the implementation later. E.g. the implementation of a
target-specific async option has to be aware of the existence of targets,
and would prevent implementation of this feature in a layer which does not
know about targets (e.g. deep down in File IO code).

> - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> WatcherThread as service thread to concurrently process the
> intermediate data structure and write the log messages out to the log
> stream. That should definitely be changed to an independent service
> thread.
>

Yes.

> - The pull pull request [2] for JDK-8229517 [3] initially proposed
> that the "service thread" runs at a fixed interval to dump log
> messages to the log streams. But reviewers commented that this should
> better happen either continuously or based on the filling level of the
> intermediate data structure. What are your preferences here?
>

I'd say dump continuously. I like the "flush on filling level" idea even
less than the idea of periodic flushes. I do not like trace systems which
arbitrarily keep output from me and need a flush to spit everything out. Or
omit the last n lines if the VM crashes.

- What are your preferences on the configuration of the intermediate
> data structure? Should it be configured based on the maximum number of
> log messages it can store or rather on the total size of the stored
> log messages? I think that for normal runs this distinction probably
> won't make a big difference because the size of log messages will
> probably be the same on average so "number of log messages" should
> always be proportional to "total size of log mesages".
>
>
I prefer the configuration of the intermediate buffer to be as a memory
size, not "number of entries". The latter does not carry any information
(what entries? how large are they?). It also, again, exposes implementation
details - in this case that there is a vector of entries.

A memory size could initially, in a first implementation, translated
roughly to deque size by just assuming an average log line length. Future
implementations then have the freedom to change this, e.g. use a
pre-allocated fixed sized buffer of the given length, or a more involved
scheme.

> 1. Before diving into more implementation details, I'd first like to
> reach a general consensus that asynchronous logging is a useful
> feature that's worth while adding to HotSpot.
>
> 2. Once we agree on that, we should agree on the desired properties of
> asynchronous logging. I've tried to collect a basic set in the
> "Proposed solution" section.
>
> 3. If that's done as well, we can discuss various implementation
> details and finally prepare new pull requests.
>
> Thank you and best regards,
> Volker
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> [2] https://github.com/openjdk/jdk/pull/3135
> [3]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> [4]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> [5] https://openjdk.java.net/jeps/158
> [6] https://openjdk.java.net/jeps/271
> [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> [10] https://man7.org/linux/man-pages/man2/futex.2.html

Thanks, Thomas