Request For Comment: Asynchronous Logging

Tue Apr 13 22:12:12 UTC 2021

Hi, Thomas,

I am still working on this feature. I have tabulated the requests from
reviewers. I draw a line of "consensus" and working on them one by one.

-------------------------------------------------------------
| request                                      | Status     |
-------------------------------------------------------------
| independent NonJavaThread                    | done [1]   |
|-----------------------------------------------------------|
| pass tier1-test with +AsyncLogging           |JDK-8265102 |
|-----------------------------------------------------------<- progress
| inject meta-info to display no. of dropping msg. |WIP     |
|-----------------------------------------------------------|
| preserve accurate decorations                |            |
|-----------------------------------------------------------|
| hoist async_mode to base class for stdout/err| [2]        |
|-----------------------------------------------------------|
| CSR                                          |JDK-8264323 |
|-----------------------------------------------------------<- consensus
|use string table over malloc                  |            |
------------------------------------------------------------|
|use lock-free data structure.                 |            |
-------------------------------------------------------------

I would like to mark items below consensus "improvement" and leave them
in separated JBS. I discussed about lock-free algorithms with Robbin on
github. He agree to defer it[3].  I don't like strdup either, but I
don't think it would hurt performance a lot. David Holmes comments and I
quote "If we had had async logging from day one then the way we
construct log messages would have been done differently". In current
codebase, I have to copy logging messages anyway, the real cost should
come from "copying" instead of allocating. Further, I have moved
"allocate" and "free" out of critical areas. If performance does matter,
we will have a overall deign including both sync/async messages and
decorators.

May I call out JDK-8264323? It's a CSR about JVM option. I have written
down pros and cons. Could you take a look? I can support both global
option and finer-grained output-option.  I won't increase much complexity.

[1]code is here:
https://github.com/openjdk/jdk/pull/3135/commits/9211121c7afeb7a9fe602fd2a4d20bd98ddb1569
[2] here is the request:
https://github.com/openjdk/jdk/pull/3135#issuecomment-809977841
[3] https://github.com/openjdk/jdk/pull/3135#issuecomment-813880728

On 4/12/21 10:30 PM, Thomas Stüfe wrote:
> *CAUTION*: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> Hi Xin & others,
> 
> Sorry for the late response, I had vacation. 
> 
> As I wrote before, I think this is a valuable improvement. I would be
> unhappy if this were stalled further. 
> 
> In this thread there seems to be a majority opinion now for 
> - having an own thread instead of using the WatcherThread
> - having a global option for async (good point, Ioi)
> 
> AFAICS the only POC left is the question whether to make the sizing
> option a memory size or a "number of entries". I argued for the memory
> size before, but won't insist on it, in order to move this issue forward
> (we could always just keep that option diagnostic).
> 
> Is there anything else left to discuss?
> 
> I mentioned before that I think UL is too complex and could benefit from
> simplification; but this is clearly not Xins fault, and I am sorry that
> I brought this even up here, since it has nothing to do with this RFE.
> 
> Cheers, Thomas
> 
> 
> 
> On Tue, Mar 30, 2021 at 11:21 PM Liu, Xin <xxinliu at amazon.com
> <mailto:xxinliu at amazon.com>> wrote:
> 
>     Thanks Volker for this. I would like to append some additional
>     materials.  I forgot to mention them when I wrote the Rationale[5]
>     yesterday.
> 
>     We identified and root-caused the tail-latency on a Linux system
>     with software RAID in 2018. 
>     We have different implementations on jdk8u and jdk11u. We are
>     seeking to merge this feature to tip.
> 
>     Nonetheless, it doesn't mean "async-logging facility" only solves
>     Amazon's peculiar problem.  When we studied this, we found many
>     interesting references.
>     Eg. LinkedIn reported and analyzed it well[1]. In particular, they
>     mentioned that one reason was Linux cache writeback [2]. IMHO, that
>     could impact
>     almost all mass-storge Linux filesystems. Twitter also expressed
>     that "I would love to hear if this can happen with OpenJDK!"[3]. 
>     This is also reported
>     by other companies[4].
> 
>     Thanks,
>     --lx
> 
> 
>     [1]
>     https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>     <https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic>
>     [2]
>     https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
>     <https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html>
>     [3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
>     <https://www.evanjones.ca/jvm-mmap-pause-finding.html>
>     [4]
>     https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
>     <https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html>
>     [5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
>     <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
> 
>     On 3/30/21, 11:20 AM, "Volker Simonis" <volker.simonis at gmail.com
>     <mailto:volker.simonis at gmail.com>> wrote:
> 
>         CAUTION: This email originated from outside of the organization.
>     Do not click links or open attachments unless you can confirm the
>     sender and know the content is safe.
> 
> 
> 
>         Hi,
> 
>         I'd like to (re)start a discussion on asynchronous logging
>     [1,2,3,4].
>         We are successfully using this feature productively at Amazon
>     both in
>         jdk8 and jdk11 to reduce the tail latency of services 

which use
>         logging. We think that async logging is a useful addition to the
>         current logging framework which might be beneficial to a wider range
>         of users. The following write-up tries to capture the 

comments and
>         suggestions from the previous discussions we are aware of.
> 
>         Current state:
> 
>         - HotSpot uses the so called "Unified Logging" (UL) framework which
>         was introduced by JEP 158 [5] in JDK 9. Most logs have been
>         retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
>         [6]).
>         - The current UL implementation is based on the standard C buffered
>         stream I/O interface [7]. The LogFileStreamOutput class which writes
>         logs to abstract FILE streams is the only child of the abstract base
>         class LogOutput. LogFileStreamOutput has three child classes
>         LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
>         stdout, stderr or an arbitrary file respectively. The 

initial UL JEP
>         158 [5] envisioned logging to a socket but has not implemented
>     it. At
>         least one such extension has been proposed since then 

[8].
>         - UL synchronizes logs from different threads with the help of the
>         standard C flockfile()/funlockfile() [9] functions. But even without
>         this explicit locking, all the "stdio functions are thread-safe.
>     This
>         is achieved by assigning to each FILE object a lockcount and (if the
>         lockcount is nonzero) an owning thread.  For each library call,
>     these
>         functions wait until the FILE object is no longer locked by a
>         different thread, then lock it, do the requested I/O, 

and unlock the
>         object again" [9]. A quick look at the glibc sources reveals
>     that FILE
>         locking is implemented with the help of futex() [10] which
>     breaks down
>         to s simple atomic compare and swap (CAS) on the fast 

path.
>         - Notice that UL "synchronizes" logs from different threads to avoid
>         log interleaving. But it does not "serialize" logs according to the
>         time at which  they occurred. This is because the underlying stdio
>         functions do not guarantee a specific order for different threads
>         waiting on a locked FILE stream. E.g. if three log events A, B, C
>         occur in that order, the first will lock the output stream. If
>     the log
>         events B and C both arrive while the stream is locked, it is
>         unspecified which of B and C will be logged first after A
>     releases the
>         lock.
> 
>         Problem statement:
> 
>         - The amount of time a log event will block its FILE stream
>     depends on
>         the underlying file system. This can range from a few
>     nanoseconds for
>         in-memory file systems or milliseconds for physical discs under
>     heavy
>         load up to several seconds in the worst case scenario 

for e.g.
>     network
>         file systems. A blocked log output stream will block all concurrent
>         threads which try to write log messages at the same time. If logging
>         is done during a safepoint, this can significantly increase the
>         safepoint time (e.g. several parallel GC threads trying to log
>     at the
>         same time). We can treat stdout/stderr as special files here without
>         loss of generality.
> 
>         Proposed solution:
> 
>         Extend UL with an asynchronous logging facility. Asynchronous
>     logging
>         will be optional and disabled by default. It should have the
>     following
>         properties:
>         - If disabled (the default) asynchronous logging should have no
>         observable impact on logging.
>         - If enabled, log messages will be stored in an intermediate data
>         structure (e.g. a double ended queue).
>         - A service thread will concurrently read and remove log
>     messages from
>         that data structure in a FIFO style and write them to 

the output
>         stream
>         - Storing log messages in the intermediate data structure should
>     take
>         constant time and not longer than logging a message takes in the
>         traditional UL system (in general the time should be much shorter
>         because the actual I/O is deferred).
>         - Asynchronous logging trades memory overhead (i.e. the size of the
>         intermediate data structure) for log accuracy. This means that
>     in the
>         unlikely case where the service thread which does the 

asynchronous
>         logging falls behind the log producing threads, some logs might be
>         lost. However, the probability for this to happen can 

be
>     minimized by
>         increasing the configurable size of the intermediate data structure.
>         - The final output produced by asynchronous logging should be
>     equal to
>         the output from normal logging if no messages had to be dropped.
>         Notice that in contrast to the traditional unified logging,
>         asynchronous logging will give us the possibility to not only
>         synchronize log events, but to optionally also serialize them
>     based on
>         their generation time if that's desired. This is because we are in
>         full control of the synchronization primitives for the intermediate
>         data structure which stores the logs.
>         - If log messages had to be dropped, this should be logged in
>     the log
>         output (e.g. "[..] 42 messages dropped due to async logging")
>         - Asynchronous logging should ideally be implemented in such a way
>         that it can be easily adapted by alternative log targets like for
>         example sockets in the future.
> 
>         Alternative solutions:
>         - It has repeatedly been suggested to place the log files into a
>         memory file system but we don't think this is an optimal solution.
>         Main memory is often a constrained resource and we don't want log
>         files to compete with the JVM for it in such cases.
>         - It has also been argued to place the log files on a 

fast file
>     system
>         which is only used for logging but in containerized environments
>     file
>         system are often virtualized and the properties of the underlying
>         physical devices are not obvious.
>         - The load on the file system might be unpredictable due to other
>         applications on the same host.
>         - All these workarounds won't work if we eventually implement direct
>         logging to a network socket as suggested in [8].
> 
>         Implementation details / POC:
> 
>         - A recent pull request [2] for JDK-8229517 [3] proposed to use a
>         simple deque implementation derived from HotSpot's LinkedListImpl
>         class for the intermediate data structure. It synchronizes access to
>         the queue with a MutexLocker which is internally implemented with
>         pthread_lock() and results in an atomic CAS on the fast path. So
>         performance-wise the locking itself is not different from the
>         flockfile()/funlockfile() functionality currently used by UL but
>         adding a log message to the deque should be constant as it basically
>         only requires a strdup(). And we could even eliminate 

the
>     strdup() if
>         we'd pre-allocate a big enough array for holding the log messages as
>         proposed in the pull request [2].
>         - The pull pull request [2] for JDK-8229517 [3] proposed to set the
>         async flag as an attribute of the Xlog option which feels more
>     natural
>         because UL configuration is traditionally done within 

the Xlog
>     option.
>         But we could just as well use a global -XX flag to enable async
>         logging? What are your preferences here?
>         - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
>         WatcherThread as service thread to concurrently process the
>         intermediate data structure and write the log messages out to
>     the log
>         stream. That should definitely be changed to an independent service
>         thread.
>         - The pull pull request [2] for JDK-8229517 [3] initially proposed
>         that the "service thread" runs at a fixed interval to 

dump log
>         messages to the log streams. But reviewers commented that this
>     should
>         better happen either continuously or based on the filling level
>     of the
>         intermediate data structure. What are your preferences here?
>         - What are your preferences on the configuration of the intermediate
>         data structure? Should it be configured based on the maximum
>     number of
>         log messages it can store or rather on the total size 

of the stored
>         log messages? I think that for normal runs this distinction probably
>         won't make a big difference because the size of log messages will
>         probably be the same on average so "number of log messages" should
>         always be proportional to "total size of log mesages".
> 
>         1. Before diving into more implementation details, I'd first like to
>         reach a general consensus that asynchronous logging is a useful
>         feature that's worth while adding to HotSpot.
> 
>         2. Once we agree on that, we should agree on the desired
>     properties of
>         asynchronous logging. I've tried to collect a basic set in the
>         "Proposed solution" section.
> 
>         3. If that's done as well, we can discuss various implementation
>         details and finally prepare new pull requests.
> 
>         Thank you and best regards,
>         Volker
> 
>         [1] https://bugs.openjdk.java.net/browse/JDK-8229517
>     <https://bugs.openjdk.java.net/browse/JDK-8229517>
>         [2] https://github.com/openjdk/jdk/pull/3135
>     <https://github.com/openjdk/jdk/pull/3135>
>         [3]
>     https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
>     <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
>         [4]
>     https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
>     <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
>         [5] https://openjdk.java.net/jeps/158
>     <https://openjdk.java.net/jeps/158>
>         [6] https://openjdk.java.net/jeps/271
>     <https://openjdk.java.net/jeps/271>
>         [7] https://man7.org/linux/man-pages/man3/stdio.3.html
>     <https://man7.org/linux/man-pages/man3/stdio.3.html>
>         [8]
>     https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
>     <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
>         [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
>     <https://man7.org/linux/man-pages/man3/flockfile.3.html>
>         [10] https://man7.org/linux/man-pages/man2/futex.2.html
>     <https://man7.org/linux/man-pages/man2/futex.2.html>
>