Request For Comment: Asynchronous Logging

Tue Mar 30 21:21:15 UTC 2021

Thanks Volker for this. I would like to append some additional materials.  I forgot to mention them when I wrote the Rationale[5] yesterday.

We identified and root-caused the tail-latency on a Linux system with software RAID in 2018.  
We have different implementations on jdk8u and jdk11u. We are seeking to merge this feature to tip.

Nonetheless, it doesn't mean "async-logging facility" only solves Amazon's peculiar problem.  When we studied this, we found many interesting references. 
Eg. LinkedIn reported and analyzed it well[1]. In particular, they mentioned that one reason was Linux cache writeback [2]. IMHO, that could impact
almost all mass-storge Linux filesystems. Twitter also expressed that "I would love to hear if this can happen with OpenJDK!"[3].  This is also reported
by other companies[4]. 

Thanks,
--lx

[1] https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
[2] https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
[3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
[4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
[5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487

On 3/30/21, 11:20 AM, "Volker Simonis" <volker.simonis at gmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

    Hi,

    I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
    We are successfully using this feature productively at Amazon both in
    jdk8 and jdk11 to reduce the tail latency of services which use
    logging. We think that async logging is a useful addition to the
    current logging framework which might be beneficial to a wider range
    of users. The following write-up tries to capture the comments and
    suggestions from the previous discussions we are aware of.

    Current state:

    - HotSpot uses the so called "Unified Logging" (UL) framework which
    was introduced by JEP 158 [5] in JDK 9. Most logs have been
    retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
    [6]).
    - The current UL implementation is based on the standard C buffered
    stream I/O interface [7]. The LogFileStreamOutput class which writes
    logs to abstract FILE streams is the only child of the abstract base
    class LogOutput. LogFileStreamOutput has three child classes
    LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
    stdout, stderr or an arbitrary file respectively. The initial UL JEP
    158 [5] envisioned logging to a socket but has not implemented it. At
    least one such extension has been proposed since then [8].
    - UL synchronizes logs from different threads with the help of the
    standard C flockfile()/funlockfile() [9] functions. But even without
    this explicit locking, all the "stdio functions are thread-safe. This
    is achieved by assigning to each FILE object a lockcount and (if the
    lockcount is nonzero) an owning thread.  For each library call, these
    functions wait until the FILE object is no longer locked by a
    different thread, then lock it, do the requested I/O, and unlock the
    object again" [9]. A quick look at the glibc sources reveals that FILE
    locking is implemented with the help of futex() [10] which breaks down
    to s simple atomic compare and swap (CAS) on the fast path.
    - Notice that UL "synchronizes" logs from different threads to avoid
    log interleaving. But it does not "serialize" logs according to the
    time at which  they occurred. This is because the underlying stdio
    functions do not guarantee a specific order for different threads
    waiting on a locked FILE stream. E.g. if three log events A, B, C
    occur in that order, the first will lock the output stream. If the log
    events B and C both arrive while the stream is locked, it is
    unspecified which of B and C will be logged first after A releases the
    lock.

    Problem statement:

    - The amount of time a log event will block its FILE stream depends on
    the underlying file system. This can range from a few nanoseconds for
    in-memory file systems or milliseconds for physical discs under heavy
    load up to several seconds in the worst case scenario for e.g. network
    file systems. A blocked log output stream will block all concurrent
    threads which try to write log messages at the same time. If logging
    is done during a safepoint, this can significantly increase the
    safepoint time (e.g. several parallel GC threads trying to log at the
    same time). We can treat stdout/stderr as special files here without
    loss of generality.

    Proposed solution:

    Extend UL with an asynchronous logging facility. Asynchronous logging
    will be optional and disabled by default. It should have the following
    properties:
    - If disabled (the default) asynchronous logging should have no
    observable impact on logging.
    - If enabled, log messages will be stored in an intermediate data
    structure (e.g. a double ended queue).
    - A service thread will concurrently read and remove log messages from
    that data structure in a FIFO style and write them to the output
    stream
    - Storing log messages in the intermediate data structure should take
    constant time and not longer than logging a message takes in the
    traditional UL system (in general the time should be much shorter
    because the actual I/O is deferred).
    - Asynchronous logging trades memory overhead (i.e. the size of the
    intermediate data structure) for log accuracy. This means that in the
    unlikely case where the service thread which does the asynchronous
    logging falls behind the log producing threads, some logs might be
    lost. However, the probability for this to happen can be minimized by
    increasing the configurable size of the intermediate data structure.
    - The final output produced by asynchronous logging should be equal to
    the output from normal logging if no messages had to be dropped.
    Notice that in contrast to the traditional unified logging,
    asynchronous logging will give us the possibility to not only
    synchronize log events, but to optionally also serialize them based on
    their generation time if that's desired. This is because we are in
    full control of the synchronization primitives for the intermediate
    data structure which stores the logs.
    - If log messages had to be dropped, this should be logged in the log
    output (e.g. "[..] 42 messages dropped due to async logging")
    - Asynchronous logging should ideally be implemented in such a way
    that it can be easily adapted by alternative log targets like for
    example sockets in the future.

    Alternative solutions:
    - It has repeatedly been suggested to place the log files into a
    memory file system but we don't think this is an optimal solution.
    Main memory is often a constrained resource and we don't want log
    files to compete with the JVM for it in such cases.
    - It has also been argued to place the log files on a fast file system
    which is only used for logging but in containerized environments file
    system are often virtualized and the properties of the underlying
    physical devices are not obvious.
    - The load on the file system might be unpredictable due to other
    applications on the same host.
    - All these workarounds won't work if we eventually implement direct
    logging to a network socket as suggested in [8].

    Implementation details / POC:

    - A recent pull request [2] for JDK-8229517 [3] proposed to use a
    simple deque implementation derived from HotSpot's LinkedListImpl
    class for the intermediate data structure. It synchronizes access to
    the queue with a MutexLocker which is internally implemented with
    pthread_lock() and results in an atomic CAS on the fast path. So
    performance-wise the locking itself is not different from the
    flockfile()/funlockfile() functionality currently used by UL but
    adding a log message to the deque should be constant as it basically
    only requires a strdup(). And we could even eliminate the strdup() if
    we'd pre-allocate a big enough array for holding the log messages as
    proposed in the pull request [2].
    - The pull pull request [2] for JDK-8229517 [3] proposed to set the
    async flag as an attribute of the Xlog option which feels more natural
    because UL configuration is traditionally done within the Xlog option.
    But we could just as well use a global -XX flag to enable async
    logging? What are your preferences here?
    - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
    WatcherThread as service thread to concurrently process the
    intermediate data structure and write the log messages out to the log
    stream. That should definitely be changed to an independent service
    thread.
    - The pull pull request [2] for JDK-8229517 [3] initially proposed
    that the "service thread" runs at a fixed interval to dump log
    messages to the log streams. But reviewers commented that this should
    better happen either continuously or based on the filling level of the
    intermediate data structure. What are your preferences here?
    - What are your preferences on the configuration of the intermediate
    data structure? Should it be configured based on the maximum number of
    log messages it can store or rather on the total size of the stored
    log messages? I think that for normal runs this distinction probably
    won't make a big difference because the size of log messages will
    probably be the same on average so "number of log messages" should
    always be proportional to "total size of log mesages".

    1. Before diving into more implementation details, I'd first like to
    reach a general consensus that asynchronous logging is a useful
    feature that's worth while adding to HotSpot.

    2. Once we agree on that, we should agree on the desired properties of
    asynchronous logging. I've tried to collect a basic set in the
    "Proposed solution" section.

    3. If that's done as well, we can discuss various implementation
    details and finally prepare new pull requests.

    Thank you and best regards,
    Volker

    [1] https://bugs.openjdk.java.net/browse/JDK-8229517
    [2] https://github.com/openjdk/jdk/pull/3135
    [3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
    [4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
    [5] https://openjdk.java.net/jeps/158
    [6] https://openjdk.java.net/jeps/271
    [7] https://man7.org/linux/man-pages/man3/stdio.3.html
    [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
    [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
    [10] https://man7.org/linux/man-pages/man2/futex.2.html