Request For Comment: Asynchronous Logging

Thomas Stüfe thomas.stuefe at gmail.com
Wed May 12 08:29:43 UTC 2021


>
>
>
I can't. I believe finalization can only be done by Joe Darcy? Not
sure. @David?

..Thomas

On Wed, May 12, 2021 at 9:55 AM Liu, Xin <xxinliu at amazon.com> wrote:

> hi, Thomas,
>
> Thank you for your reasoning. Your arguments are convincing.
>
> I have updated the CSR description.  No matter what, dropping the
> incoming message is trivial. let's keep thing simple.
>
> Can we finalize the CSR?
>
thanks,
> --lx
>
>
>
> On 5/11/21 3:08 AM, Thomas Stüfe wrote:
> > *CAUTION*: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender
> > and know the content is safe.
> >
> >
> > Hi Xin, see my answer in the CSR.
> >
> > Cheers, Thomas
> >
> > On Tue, May 11, 2021 at 9:53 AM Liu, Xin <xxinliu at amazon.com
> > <mailto:xxinliu at amazon.com>> wrote:
> >
> >     Hi,
> >
> >     This RFC covers almost all aspects of async logging. Since then, we
> have
> >     drawn many consensus and implemented them one by one.
> >
> >     I still have one open question. What's the discard policy?
> >
> >     Problem statement:
> >     For async-logging, we proposed a lossy non-blocking design. that is
> to
> >     say, the buffer is bounded and we have to drop messages for
> >     non-blocking. The rationale was described: here:
> >     https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
> >     <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
> >
> >     The discard policy defines which message is dropped at logsite when
> the
> >     buffer is full. There're at least 3 choices.
> >     1. drop the oldest message and proceed to enqueue.
> >     2. give up enqueue attempt. that is to say, the newest message is
> >     discarded.
> >     3. support them both, let user to choose.
> >
> >     It seems different data structures of buffer bias to different
> discard
> >     policy. eg. printk of Linux kernel uses a fixed ringbuffer. It
> chooses
> >     1) because that is naturally done by ringbuffer overwrite.
> >
> >     Folly logging library also supports async writing. They chose 2)
> because
> >     it's straight-forward for a bounded array.
> >
> https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91
> >     <
> https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91
> >
> >
> >     My current implementation is just in the middle two polar opinions.
> my
> >     buffer is implemented using a linkedlist-based deque. It means that I
> >     can pop elements from either sides. technically speaking, I can
> easily
> >     support 2 discard policies.
> >
> >     In CSR, I wrote the following sentence in Solution section.
> >     "On buffer exhaustion a message is discarded so that under no
> >     circumstances will logging block threads. Which message is discarded
> >     depends on implementation. In current implementation, the oldest
> message
> >     is discarded to make room for the newest message. "
> >
> >     The reason I would like to "leave to implementation" because I don't
> >     think it's a good idea to choose a discard policy in CSR. The wiggle
> >     room can give more freedom for changing different data structures and
> >     algorithms. David said we can't it undefined. If so, what's the
> discard
> >     policy HotSpot prefers?
> >
> >     We have discussed it on CSR.
> >
> https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683
> >     <
> https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683
> >
> >
> >     There are more audiences. Inputs are welcome.
> >
> >     FYI, current PR#3135 has implemented dropped message statistics. In
> the
> >     log files, you would find meta messages to indicate how many messages
> >     were lost. eg. line @34300:
> >     34299-[trace][metaspace              ] Metaspace::allocate: type 7
> >     return 0x00007fcf284fcb18.
> >     34300:[warning][                       ]    529 messages dropped...
> >     34301-[trace  ][metaspace              ] Arena @0x00007fcf80252df0
> >     (non-class sm): after allocation: 1 chunk(s),
> >     current:@0x00007fcf80243ef0, u, base 0x00007fcf28400000, level lv00
> >     (524288), used: 132162, committed: 139264, committed-free: 7102
> >
> >     thanks,
> >     --lx
> >
> >
> >
> >     On 3/30/21 11:18 AM, Volker Simonis wrote:
> >     > CAUTION: This email originated from outside of the organization.
> >     Do not click links or open attachments unless you can confirm the
> >     sender and know the content is safe.
> >     >
> >     >
> >     >
> >     > Hi,
> >     >
> >     > I'd like to (re)start a discussion on asynchronous logging
> [1,2,3,4].
> >     > We are successfully using this feature productively at Amazon both
> in
> >     > jdk8 and jdk11 to reduce the tail latency of services which use
> >     > logging. We think that async logging is a useful addition to the
> >     > current logging framework which might be beneficial to a wider
> range
> >     > of users. The following write-up tries to capture the comments and
> >     > suggestions from the previous discussions we are aware of.
> >     >
> >     > Current state:
> >     >
> >     > - HotSpot uses the so called "Unified Logging" (UL) framework which
> >     > was introduced by JEP 158 [5] in JDK 9. Most logs have been
> >     > retrofitted to use UL since then (e.g. "JEP 271: Unified GC
> Logging"
> >     > [6]).
> >     > - The current UL implementation is based on the standard C buffered
> >     > stream I/O interface [7]. The LogFileStreamOutput class which
> writes
> >     > logs to abstract FILE streams is the only child of the abstract
> base
> >     > class LogOutput. LogFileStreamOutput has three child classes
> >     > LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
> >     > stdout, stderr or an arbitrary file respectively. The initial UL
> JEP
> >     > 158 [5] envisioned logging to a socket but has not implemented it.
> At
> >     > least one such extension has been proposed since then [8].
> >     > - UL synchronizes logs from different threads with the help of the
> >     > standard C flockfile()/funlockfile() [9] functions. But even
> without
> >     > this explicit locking, all the "stdio functions are thread-safe.
> This
> >     > is achieved by assigning to each FILE object a lockcount and (if
> the
> >     > lockcount is nonzero) an owning thread.  For each library call,
> these
> >     > functions wait until the FILE object is no longer locked by a
> >     > different thread, then lock it, do the requested I/O, and unlock
> the
> >     > object again" [9]. A quick look at the glibc sources reveals that
> FILE
> >     > locking is implemented with the help of futex() [10] which breaks
> down
> >     > to s simple atomic compare and swap (CAS) on the fast path.
> >     > - Notice that UL "synchronizes" logs from different threads to
> avoid
> >     > log interleaving. But it does not "serialize" logs according to the
> >     > time at which  they occurred. This is because the underlying stdio
> >     > functions do not guarantee a specific order for different threads
> >     > waiting on a locked FILE stream. E.g. if three log events A, B, C
> >     > occur in that order, the first will lock the output stream. If the
> log
> >     > events B and C both arrive while the stream is locked, it is
> >     > unspecified which of B and C will be logged first after A releases
> the
> >     > lock.
> >     >
> >     > Problem statement:
> >     >
> >     > - The amount of time a log event will block its FILE stream
> depends on
> >     > the underlying file system. This can range from a few nanoseconds
> for
> >     > in-memory file systems or milliseconds for physical discs under
> heavy
> >     > load up to several seconds in the worst case scenario for e.g.
> network
> >     > file systems. A blocked log output stream will block all concurrent
> >     > threads which try to write log messages at the same time. If
> logging
> >     > is done during a safepoint, this can significantly increase the
> >     > safepoint time (e.g. several parallel GC threads trying to log at
> the
> >     > same time). We can treat stdout/stderr as special files here
> without
> >     > loss of generality.
> >     >
> >     > Proposed solution:
> >     >
> >     > Extend UL with an asynchronous logging facility. Asynchronous
> logging
> >     > will be optional and disabled by default. It should have the
> following
> >     > properties:
> >     > - If disabled (the default) asynchronous logging should have no
> >     > observable impact on logging.
> >     > - If enabled, log messages will be stored in an intermediate data
> >     > structure (e.g. a double ended queue).
> >     > - A service thread will concurrently read and remove log messages
> from
> >     > that data structure in a FIFO style and write them to the output
> >     > stream
> >     > - Storing log messages in the intermediate data structure should
> take
> >     > constant time and not longer than logging a message takes in the
> >     > traditional UL system (in general the time should be much shorter
> >     > because the actual I/O is deferred).
> >     > - Asynchronous logging trades memory overhead (i.e. the size of the
> >     > intermediate data structure) for log accuracy. This means that in
> the
> >     > unlikely case where the service thread which does the asynchronous
> >     > logging falls behind the log producing threads, some logs might be
> >     > lost. However, the probability for this to happen can be minimized
> by
> >     > increasing the configurable size of the intermediate data
> structure.
> >     > - The final output produced by asynchronous logging should be
> equal to
> >     > the output from normal logging if no messages had to be dropped.
> >     > Notice that in contrast to the traditional unified logging,
> >     > asynchronous logging will give us the possibility to not only
> >     > synchronize log events, but to optionally also serialize them
> based on
> >     > their generation time if that's desired. This is because we are in
> >     > full control of the synchronization primitives for the intermediate
> >     > data structure which stores the logs.
> >     > - If log messages had to be dropped, this should be logged in the
> log
> >     > output (e.g. "[..] 42 messages dropped due to async logging")
> >     > - Asynchronous logging should ideally be implemented in such a way
> >     > that it can be easily adapted by alternative log targets like for
> >     > example sockets in the future.
> >     >
> >     > Alternative solutions:
> >     > - It has repeatedly been suggested to place the log files into a
> >     > memory file system but we don't think this is an optimal solution.
> >     > Main memory is often a constrained resource and we don't want log
> >     > files to compete with the JVM for it in such cases.
> >     > - It has also been argued to place the log files on a fast file
> system
> >     > which is only used for logging but in containerized environments
> file
> >     > system are often virtualized and the properties of the underlying
> >     > physical devices are not obvious.
> >     > - The load on the file system might be unpredictable due to other
> >     > applications on the same host.
> >     > - All these workarounds won't work if we eventually implement
> direct
> >     > logging to a network socket as suggested in [8].
> >     >
> >     > Implementation details / POC:
> >     >
> >     > - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> >     > simple deque implementation derived from HotSpot's LinkedListImpl
> >     > class for the intermediate data structure. It synchronizes access
> to
> >     > the queue with a MutexLocker which is internally implemented with
> >     > pthread_lock() and results in an atomic CAS on the fast path. So
> >     > performance-wise the locking itself is not different from the
> >     > flockfile()/funlockfile() functionality currently used by UL but
> >     > adding a log message to the deque should be constant as it
> basically
> >     > only requires a strdup(). And we could even eliminate the strdup()
> if
> >     > we'd pre-allocate a big enough array for holding the log messages
> as
> >     > proposed in the pull request [2].
> >     > - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> >     > async flag as an attribute of the Xlog option which feels more
> natural
> >     > because UL configuration is traditionally done within the Xlog
> option.
> >     > But we could just as well use a global -XX flag to enable async
> >     > logging? What are your preferences here?
> >     > - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> >     > WatcherThread as service thread to concurrently process the
> >     > intermediate data structure and write the log messages out to the
> log
> >     > stream. That should definitely be changed to an independent service
> >     > thread.
> >     > - The pull pull request [2] for JDK-8229517 [3] initially proposed
> >     > that the "service thread" runs at a fixed interval to dump log
> >     > messages to the log streams. But reviewers commented that this
> should
> >     > better happen either continuously or based on the filling level of
> the
> >     > intermediate data structure. What are your preferences here?
> >     > - What are your preferences on the configuration of the
> intermediate
> >     > data structure? Should it be configured based on the maximum
> number of
> >     > log messages it can store or rather on the total size of the stored
> >     > log messages? I think that for normal runs this distinction
> probably
> >     > won't make a big difference because the size of log messages will
> >     > probably be the same on average so "number of log messages" should
> >     > always be proportional to "total size of log mesages".
> >     >
> >     > 1. Before diving into more implementation details, I'd first like
> to
> >     > reach a general consensus that asynchronous logging is a useful
> >     > feature that's worth while adding to HotSpot.
> >     >
> >     > 2. Once we agree on that, we should agree on the desired
> properties of
> >     > asynchronous logging. I've tried to collect a basic set in the
> >     > "Proposed solution" section.
> >     >
> >     > 3. If that's done as well, we can discuss various implementation
> >     > details and finally prepare new pull requests.
> >     >
> >     > Thank you and best regards,
> >     > Volker
> >     >
> >     > [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> >     <https://bugs.openjdk.java.net/browse/JDK-8229517>
> >     > [2] https://github.com/openjdk/jdk/pull/3135
> >     <https://github.com/openjdk/jdk/pull/3135>
> >     > [3]
> >
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> >     <
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> >
> >     > [4]
> >
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> >     <
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> >
> >     > [5] https://openjdk.java.net/jeps/158
> >     <https://openjdk.java.net/jeps/158>
> >     > [6] https://openjdk.java.net/jeps/271
> >     <https://openjdk.java.net/jeps/271>
> >     > [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> >     <https://man7.org/linux/man-pages/man3/stdio.3.html>
> >     > [8]
> >     https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> >     <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
> >     > [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> >     <https://man7.org/linux/man-pages/man3/flockfile.3.html>
> >     > [10] https://man7.org/linux/man-pages/man2/futex.2.html
> >     <https://man7.org/linux/man-pages/man2/futex.2.html>
> >     >
> >
>


More information about the hotspot-dev mailing list