Request For Comment: Asynchronous Logging

Tue May 11 10:08:58 UTC 2021

Hi Xin, see my answer in the CSR.

Cheers, Thomas

On Tue, May 11, 2021 at 9:53 AM Liu, Xin <xxinliu at amazon.com> wrote:

> Hi,
>
> This RFC covers almost all aspects of async logging. Since then, we have
> drawn many consensus and implemented them one by one.
>
> I still have one open question. What's the discard policy?
>
> Problem statement:
> For async-logging, we proposed a lossy non-blocking design. that is to
> say, the buffer is bounded and we have to drop messages for
> non-blocking. The rationale was described: here:
> https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
>
> The discard policy defines which message is dropped at logsite when the
> buffer is full. There're at least 3 choices.
> 1. drop the oldest message and proceed to enqueue.
> 2. give up enqueue attempt. that is to say, the newest message is
> discarded.
> 3. support them both, let user to choose.
>
> It seems different data structures of buffer bias to different discard
> policy. eg. printk of Linux kernel uses a fixed ringbuffer. It chooses
> 1) because that is naturally done by ringbuffer overwrite.
>
> Folly logging library also supports async writing. They chose 2) because
> it's straight-forward for a bounded array.
>
> https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91
>
> My current implementation is just in the middle two polar opinions. my
> buffer is implemented using a linkedlist-based deque. It means that I
> can pop elements from either sides. technically speaking, I can easily
> support 2 discard policies.
>
> In CSR, I wrote the following sentence in Solution section.
> "On buffer exhaustion a message is discarded so that under no
> circumstances will logging block threads. Which message is discarded
> depends on implementation. In current implementation, the oldest message
> is discarded to make room for the newest message. "
>
> The reason I would like to "leave to implementation" because I don't
> think it's a good idea to choose a discard policy in CSR. The wiggle
> room can give more freedom for changing different data structures and
> algorithms. David said we can't it undefined. If so, what's the discard
> policy HotSpot prefers?
>
> We have discussed it on CSR.
>
> https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683
>
> There are more audiences. Inputs are welcome.
>
> FYI, current PR#3135 has implemented dropped message statistics. In the
> log files, you would find meta messages to indicate how many messages
> were lost. eg. line @34300:
> 34299-[trace][metaspace              ] Metaspace::allocate: type 7
> return 0x00007fcf284fcb18.
> 34300:[warning][                       ]    529 messages dropped...
> 34301-[trace  ][metaspace              ] Arena @0x00007fcf80252df0
> (non-class sm): after allocation: 1 chunk(s),
> current:@0x00007fcf80243ef0, u, base 0x00007fcf28400000, level lv00
> (524288), used: 132162, committed: 139264, committed-free: 7102
>
> thanks,
> --lx
>
>
>
> On 3/30/21 11:18 AM, Volker Simonis wrote:
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > Hi,
> >
> > I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
> > We are successfully using this feature productively at Amazon both in
> > jdk8 and jdk11 to reduce the tail latency of services which use
> > logging. We think that async logging is a useful addition to the
> > current logging framework which might be beneficial to a wider range
> > of users. The following write-up tries to capture the comments and
> > suggestions from the previous discussions we are aware of.
> >
> > Current state:
> >
> > - HotSpot uses the so called "Unified Logging" (UL) framework which
> > was introduced by JEP 158 [5] in JDK 9. Most logs have been
> > retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
> > [6]).
> > - The current UL implementation is based on the standard C buffered
> > stream I/O interface [7]. The LogFileStreamOutput class which writes
> > logs to abstract FILE streams is the only child of the abstract base
> > class LogOutput. LogFileStreamOutput has three child classes
> > LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
> > stdout, stderr or an arbitrary file respectively. The initial UL JEP
> > 158 [5] envisioned logging to a socket but has not implemented it. At
> > least one such extension has been proposed since then [8].
> > - UL synchronizes logs from different threads with the help of the
> > standard C flockfile()/funlockfile() [9] functions. But even without
> > this explicit locking, all the "stdio functions are thread-safe. This
> > is achieved by assigning to each FILE object a lockcount and (if the
> > lockcount is nonzero) an owning thread.  For each library call, these
> > functions wait until the FILE object is no longer locked by a
> > different thread, then lock it, do the requested I/O, and unlock the
> > object again" [9]. A quick look at the glibc sources reveals that FILE
> > locking is implemented with the help of futex() [10] which breaks down
> > to s simple atomic compare and swap (CAS) on the fast path.
> > - Notice that UL "synchronizes" logs from different threads to avoid
> > log interleaving. But it does not "serialize" logs according to the
> > time at which  they occurred. This is because the underlying stdio
> > functions do not guarantee a specific order for different threads
> > waiting on a locked FILE stream. E.g. if three log events A, B, C
> > occur in that order, the first will lock the output stream. If the log
> > events B and C both arrive while the stream is locked, it is
> > unspecified which of B and C will be logged first after A releases the
> > lock.
> >
> > Problem statement:
> >
> > - The amount of time a log event will block its FILE stream depends on
> > the underlying file system. This can range from a few nanoseconds for
> > in-memory file systems or milliseconds for physical discs under heavy
> > load up to several seconds in the worst case scenario for e.g. network
> > file systems. A blocked log output stream will block all concurrent
> > threads which try to write log messages at the same time. If logging
> > is done during a safepoint, this can significantly increase the
> > safepoint time (e.g. several parallel GC threads trying to log at the
> > same time). We can treat stdout/stderr as special files here without
> > loss of generality.
> >
> > Proposed solution:
> >
> > Extend UL with an asynchronous logging facility. Asynchronous logging
> > will be optional and disabled by default. It should have the following
> > properties:
> > - If disabled (the default) asynchronous logging should have no
> > observable impact on logging.
> > - If enabled, log messages will be stored in an intermediate data
> > structure (e.g. a double ended queue).
> > - A service thread will concurrently read and remove log messages from
> > that data structure in a FIFO style and write them to the output
> > stream
> > - Storing log messages in the intermediate data structure should take
> > constant time and not longer than logging a message takes in the
> > traditional UL system (in general the time should be much shorter
> > because the actual I/O is deferred).
> > - Asynchronous logging trades memory overhead (i.e. the size of the
> > intermediate data structure) for log accuracy. This means that in the
> > unlikely case where the service thread which does the asynchronous
> > logging falls behind the log producing threads, some logs might be
> > lost. However, the probability for this to happen can be minimized by
> > increasing the configurable size of the intermediate data structure.
> > - The final output produced by asynchronous logging should be equal to
> > the output from normal logging if no messages had to be dropped.
> > Notice that in contrast to the traditional unified logging,
> > asynchronous logging will give us the possibility to not only
> > synchronize log events, but to optionally also serialize them based on
> > their generation time if that's desired. This is because we are in
> > full control of the synchronization primitives for the intermediate
> > data structure which stores the logs.
> > - If log messages had to be dropped, this should be logged in the log
> > output (e.g. "[..] 42 messages dropped due to async logging")
> > - Asynchronous logging should ideally be implemented in such a way
> > that it can be easily adapted by alternative log targets like for
> > example sockets in the future.
> >
> > Alternative solutions:
> > - It has repeatedly been suggested to place the log files into a
> > memory file system but we don't think this is an optimal solution.
> > Main memory is often a constrained resource and we don't want log
> > files to compete with the JVM for it in such cases.
> > - It has also been argued to place the log files on a fast file system
> > which is only used for logging but in containerized environments file
> > system are often virtualized and the properties of the underlying
> > physical devices are not obvious.
> > - The load on the file system might be unpredictable due to other
> > applications on the same host.
> > - All these workarounds won't work if we eventually implement direct
> > logging to a network socket as suggested in [8].
> >
> > Implementation details / POC:
> >
> > - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> > simple deque implementation derived from HotSpot's LinkedListImpl
> > class for the intermediate data structure. It synchronizes access to
> > the queue with a MutexLocker which is internally implemented with
> > pthread_lock() and results in an atomic CAS on the fast path. So
> > performance-wise the locking itself is not different from the
> > flockfile()/funlockfile() functionality currently used by UL but
> > adding a log message to the deque should be constant as it basically
> > only requires a strdup(). And we could even eliminate the strdup() if
> > we'd pre-allocate a big enough array for holding the log messages as
> > proposed in the pull request [2].
> > - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> > async flag as an attribute of the Xlog option which feels more natural
> > because UL configuration is traditionally done within the Xlog option.
> > But we could just as well use a global -XX flag to enable async
> > logging? What are your preferences here?
> > - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> > WatcherThread as service thread to concurrently process the
> > intermediate data structure and write the log messages out to the log
> > stream. That should definitely be changed to an independent service
> > thread.
> > - The pull pull request [2] for JDK-8229517 [3] initially proposed
> > that the "service thread" runs at a fixed interval to dump log
> > messages to the log streams. But reviewers commented that this should
> > better happen either continuously or based on the filling level of the
> > intermediate data structure. What are your preferences here?
> > - What are your preferences on the configuration of the intermediate
> > data structure? Should it be configured based on the maximum number of
> > log messages it can store or rather on the total size of the stored
> > log messages? I think that for normal runs this distinction probably
> > won't make a big difference because the size of log messages will
> > probably be the same on average so "number of log messages" should
> > always be proportional to "total size of log mesages".
> >
> > 1. Before diving into more implementation details, I'd first like to
> > reach a general consensus that asynchronous logging is a useful
> > feature that's worth while adding to HotSpot.
> >
> > 2. Once we agree on that, we should agree on the desired properties of
> > asynchronous logging. I've tried to collect a basic set in the
> > "Proposed solution" section.
> >
> > 3. If that's done as well, we can discuss various implementation
> > details and finally prepare new pull requests.
> >
> > Thank you and best regards,
> > Volker
> >
> > [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> > [2] https://github.com/openjdk/jdk/pull/3135
> > [3]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> > [4]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> > [5] https://openjdk.java.net/jeps/158
> > [6] https://openjdk.java.net/jeps/271
> > [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> > [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> > [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> > [10] https://man7.org/linux/man-pages/man2/futex.2.html
> >
>