Request For Comment: Asynchronous Logging

Wed May 12 07:55:45 UTC 2021

hi, Thomas,

Thank you for your reasoning. Your arguments are convincing.

I have updated the CSR description.  No matter what, dropping the
incoming message is trivial. let's keep thing simple.

Can we finalize the CSR?

thanks,
--lx

On 5/11/21 3:08 AM, Thomas Stüfe wrote:
> *CAUTION*: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> Hi Xin, see my answer in the CSR.
> 
> Cheers, Thomas
> 
> On Tue, May 11, 2021 at 9:53 AM Liu, Xin <xxinliu at amazon.com
> <mailto:xxinliu at amazon.com>> wrote:
> 
>     Hi,
> 
>     This RFC covers almost all aspects of async logging. Since then, we have
>     drawn many consensus and implemented them one by one.
> 
>     I still have one open question. What's the discard policy?
> 
>     Problem statement:
>     For async-logging, we proposed a lossy non-blocking design. that is to
>     say, the buffer is bounded and we have to drop messages for
>     non-blocking. The rationale was described: here:
>     https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
>     <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
> 
>     The discard policy defines which message is dropped at logsite when the
>     buffer is full. There're at least 3 choices.
>     1. drop the oldest message and proceed to enqueue.
>     2. give up enqueue attempt. that is to say, the newest message is
>     discarded.
>     3. support them both, let user to choose.
> 
>     It seems different data structures of buffer bias to different discard
>     policy. eg. printk of Linux kernel uses a fixed ringbuffer. It chooses
>     1) because that is naturally done by ringbuffer overwrite.
> 
>     Folly logging library also supports async writing. They chose 2) because
>     it's straight-forward for a bounded array.
>     https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91
>     <https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91>
> 
>     My current implementation is just in the middle two polar opinions. my
>     buffer is implemented using a linkedlist-based deque. It means that I
>     can pop elements from either sides. technically speaking, I can easily
>     support 2 discard policies.
> 
>     In CSR, I wrote the following sentence in Solution section.
>     "On buffer exhaustion a message is discarded so that under no
>     circumstances will logging block threads. Which message is discarded
>     depends on implementation. In current implementation, the oldest message
>     is discarded to make room for the newest message. "
> 
>     The reason I would like to "leave to implementation" because I don't
>     think it's a good idea to choose a discard policy in CSR. The wiggle
>     room can give more freedom for changing different data structures and
>     algorithms. David said we can't it undefined. If so, what's the discard
>     policy HotSpot prefers?
> 
>     We have discussed it on CSR.
>     https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683
>     <https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683>
> 
>     There are more audiences. Inputs are welcome.
> 
>     FYI, current PR#3135 has implemented dropped message statistics. In the
>     log files, you would find meta messages to indicate how many messages
>     were lost. eg. line @34300:
>     34299-[trace][metaspace              ] Metaspace::allocate: type 7
>     return 0x00007fcf284fcb18.
>     34300:[warning][                       ]    529 messages dropped...
>     34301-[trace  ][metaspace              ] Arena @0x00007fcf80252df0
>     (non-class sm): after allocation: 1 chunk(s),
>     current:@0x00007fcf80243ef0, u, base 0x00007fcf28400000, level lv00
>     (524288), used: 132162, committed: 139264, committed-free: 7102
> 
>     thanks,
>     --lx
> 
> 
> 
>     On 3/30/21 11:18 AM, Volker Simonis wrote:
>     > CAUTION: This email originated from outside of the organization.
>     Do not click links or open attachments unless you can confirm the
>     sender and know the content is safe.
>     >
>     >
>     >
>     > Hi,
>     >
>     > I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
>     > We are successfully using this feature productively at Amazon both in
>     > jdk8 and jdk11 to reduce the tail latency of services which use
>     > logging. We think that async logging is a useful addition to the
>     > current logging framework which might be beneficial to a wider range
>     > of users. The following write-up tries to capture the comments and
>     > suggestions from the previous discussions we are aware of.
>     >
>     > Current state:
>     >
>     > - HotSpot uses the so called "Unified Logging" (UL) framework which
>     > was introduced by JEP 158 [5] in JDK 9. Most logs have been
>     > retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
>     > [6]).
>     > - The current UL implementation is based on the standard C buffered
>     > stream I/O interface [7]. The LogFileStreamOutput class which writes
>     > logs to abstract FILE streams is the only child of the abstract base
>     > class LogOutput. LogFileStreamOutput has three child classes
>     > LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
>     > stdout, stderr or an arbitrary file respectively. The initial UL JEP
>     > 158 [5] envisioned logging to a socket but has not implemented it. At
>     > least one such extension has been proposed since then [8].
>     > - UL synchronizes logs from different threads with the help of the
>     > standard C flockfile()/funlockfile() [9] functions. But even without
>     > this explicit locking, all the "stdio functions are thread-safe. This
>     > is achieved by assigning to each FILE object a lockcount and (if the
>     > lockcount is nonzero) an owning thread.  For each library call, these
>     > functions wait until the FILE object is no longer locked by a
>     > different thread, then lock it, do the requested I/O, and unlock the
>     > object again" [9]. A quick look at the glibc sources reveals that FILE
>     > locking is implemented with the help of futex() [10] which breaks down
>     > to s simple atomic compare and swap (CAS) on the fast path.
>     > - Notice that UL "synchronizes" logs from different threads to avoid
>     > log interleaving. But it does not "serialize" logs according to the
>     > time at which  they occurred. This is because the underlying stdio
>     > functions do not guarantee a specific order for different threads
>     > waiting on a locked FILE stream. E.g. if three log events A, B, C
>     > occur in that order, the first will lock the output stream. If the log
>     > events B and C both arrive while the stream is locked, it is
>     > unspecified which of B and C will be logged first after A releases the
>     > lock.
>     >
>     > Problem statement:
>     >
>     > - The amount of time a log event will block its FILE stream depends on
>     > the underlying file system. This can range from a few nanoseconds for
>     > in-memory file systems or milliseconds for physical discs under heavy
>     > load up to several seconds in the worst case scenario for e.g. network
>     > file systems. A blocked log output stream will block all concurrent
>     > threads which try to write log messages at the same time. If logging
>     > is done during a safepoint, this can significantly increase the
>     > safepoint time (e.g. several parallel GC threads trying to log at the
>     > same time). We can treat stdout/stderr as special files here without
>     > loss of generality.
>     >
>     > Proposed solution:
>     >
>     > Extend UL with an asynchronous logging facility. Asynchronous logging
>     > will be optional and disabled by default. It should have the following
>     > properties:
>     > - If disabled (the default) asynchronous logging should have no
>     > observable impact on logging.
>     > - If enabled, log messages will be stored in an intermediate data
>     > structure (e.g. a double ended queue).
>     > - A service thread will concurrently read and remove log messages from
>     > that data structure in a FIFO style and write them to the output
>     > stream
>     > - Storing log messages in the intermediate data structure should take
>     > constant time and not longer than logging a message takes in the
>     > traditional UL system (in general the time should be much shorter
>     > because the actual I/O is deferred).
>     > - Asynchronous logging trades memory overhead (i.e. the size of the
>     > intermediate data structure) for log accuracy. This means that in the
>     > unlikely case where the service thread which does the asynchronous
>     > logging falls behind the log producing threads, some logs might be
>     > lost. However, the probability for this to happen can be minimized by
>     > increasing the configurable size of the intermediate data structure.
>     > - The final output produced by asynchronous logging should be equal to
>     > the output from normal logging if no messages had to be dropped.
>     > Notice that in contrast to the traditional unified logging,
>     > asynchronous logging will give us the possibility to not only
>     > synchronize log events, but to optionally also serialize them based on
>     > their generation time if that's desired. This is because we are in
>     > full control of the synchronization primitives for the intermediate
>     > data structure which stores the logs.
>     > - If log messages had to be dropped, this should be logged in the log
>     > output (e.g. "[..] 42 messages dropped due to async logging")
>     > - Asynchronous logging should ideally be implemented in such a way
>     > that it can be easily adapted by alternative log targets like for
>     > example sockets in the future.
>     >
>     > Alternative solutions:
>     > - It has repeatedly been suggested to place the log files into a
>     > memory file system but we don't think this is an optimal solution.
>     > Main memory is often a constrained resource and we don't want log
>     > files to compete with the JVM for it in such cases.
>     > - It has also been argued to place the log files on a fast file system
>     > which is only used for logging but in containerized environments file
>     > system are often virtualized and the properties of the underlying
>     > physical devices are not obvious.
>     > - The load on the file system might be unpredictable due to other
>     > applications on the same host.
>     > - All these workarounds won't work if we eventually implement direct
>     > logging to a network socket as suggested in [8].
>     >
>     > Implementation details / POC:
>     >
>     > - A recent pull request [2] for JDK-8229517 [3] proposed to use a
>     > simple deque implementation derived from HotSpot's LinkedListImpl
>     > class for the intermediate data structure. It synchronizes access to
>     > the queue with a MutexLocker which is internally implemented with
>     > pthread_lock() and results in an atomic CAS on the fast path. So
>     > performance-wise the locking itself is not different from the
>     > flockfile()/funlockfile() functionality currently used by UL but
>     > adding a log message to the deque should be constant as it basically
>     > only requires a strdup(). And we could even eliminate the strdup() if
>     > we'd pre-allocate a big enough array for holding the log messages as
>     > proposed in the pull request [2].
>     > - The pull pull request [2] for JDK-8229517 [3] proposed to set the
>     > async flag as an attribute of the Xlog option which feels more natural
>     > because UL configuration is traditionally done within the Xlog option.
>     > But we could just as well use a global -XX flag to enable async
>     > logging? What are your preferences here?
>     > - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
>     > WatcherThread as service thread to concurrently process the
>     > intermediate data structure and write the log messages out to the log
>     > stream. That should definitely be changed to an independent service
>     > thread.
>     > - The pull pull request [2] for JDK-8229517 [3] initially proposed
>     > that the "service thread" runs at a fixed interval to dump log
>     > messages to the log streams. But reviewers commented that this should
>     > better happen either continuously or based on the filling level of the
>     > intermediate data structure. What are your preferences here?
>     > - What are your preferences on the configuration of the intermediate
>     > data structure? Should it be configured based on the maximum number of
>     > log messages it can store or rather on the total size of the stored
>     > log messages? I think that for normal runs this distinction probably
>     > won't make a big difference because the size of log messages will
>     > probably be the same on average so "number of log messages" should
>     > always be proportional to "total size of log mesages".
>     >
>     > 1. Before diving into more implementation details, I'd first like to
>     > reach a general consensus that asynchronous logging is a useful
>     > feature that's worth while adding to HotSpot.
>     >
>     > 2. Once we agree on that, we should agree on the desired properties of
>     > asynchronous logging. I've tried to collect a basic set in the
>     > "Proposed solution" section.
>     >
>     > 3. If that's done as well, we can discuss various implementation
>     > details and finally prepare new pull requests.
>     >
>     > Thank you and best regards,
>     > Volker
>     >
>     > [1] https://bugs.openjdk.java.net/browse/JDK-8229517
>     <https://bugs.openjdk.java.net/browse/JDK-8229517>
>     > [2] https://github.com/openjdk/jdk/pull/3135
>     <https://github.com/openjdk/jdk/pull/3135>
>     > [3]
>     https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
>     <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
>     > [4]
>     https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
>     <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
>     > [5] https://openjdk.java.net/jeps/158
>     <https://openjdk.java.net/jeps/158>
>     > [6] https://openjdk.java.net/jeps/271
>     <https://openjdk.java.net/jeps/271>
>     > [7] https://man7.org/linux/man-pages/man3/stdio.3.html
>     <https://man7.org/linux/man-pages/man3/stdio.3.html>
>     > [8]
>     https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
>     <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
>     > [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
>     <https://man7.org/linux/man-pages/man3/flockfile.3.html>
>     > [10] https://man7.org/linux/man-pages/man2/futex.2.html
>     <https://man7.org/linux/man-pages/man2/futex.2.html>
>     >
>