Request For Comment: Asynchronous Logging
Liu, Xin
xxinliu at amazon.com
Tue May 11 07:53:08 UTC 2021
Hi,
This RFC covers almost all aspects of async logging. Since then, we have
drawn many consensus and implemented them one by one.
I still have one open question. What's the discard policy?
Problem statement:
For async-logging, we proposed a lossy non-blocking design. that is to
say, the buffer is bounded and we have to drop messages for
non-blocking. The rationale was described: here:
https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
The discard policy defines which message is dropped at logsite when the
buffer is full. There're at least 3 choices.
1. drop the oldest message and proceed to enqueue.
2. give up enqueue attempt. that is to say, the newest message is
discarded.
3. support them both, let user to choose.
It seems different data structures of buffer bias to different discard
policy. eg. printk of Linux kernel uses a fixed ringbuffer. It chooses
1) because that is naturally done by ringbuffer overwrite.
Folly logging library also supports async writing. They chose 2) because
it's straight-forward for a bounded array.
https://github.com/facebook/folly/blob/master/folly/logging/AsyncLogWriter.cpp#L91
My current implementation is just in the middle two polar opinions. my
buffer is implemented using a linkedlist-based deque. It means that I
can pop elements from either sides. technically speaking, I can easily
support 2 discard policies.
In CSR, I wrote the following sentence in Solution section.
"On buffer exhaustion a message is discarded so that under no
circumstances will logging block threads. Which message is discarded
depends on implementation. In current implementation, the oldest message
is discarded to make room for the newest message. "
The reason I would like to "leave to implementation" because I don't
think it's a good idea to choose a discard policy in CSR. The wiggle
room can give more freedom for changing different data structures and
algorithms. David said we can't it undefined. If so, what's the discard
policy HotSpot prefers?
We have discussed it on CSR.
https://bugs.openjdk.java.net/browse/JDK-8264323?focusedCommentId=14419683&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14419683
There are more audiences. Inputs are welcome.
FYI, current PR#3135 has implemented dropped message statistics. In the
log files, you would find meta messages to indicate how many messages
were lost. eg. line @34300:
34299-[trace][metaspace ] Metaspace::allocate: type 7
return 0x00007fcf284fcb18.
34300:[warning][ ] 529 messages dropped...
34301-[trace ][metaspace ] Arena @0x00007fcf80252df0
(non-class sm): after allocation: 1 chunk(s),
current:@0x00007fcf80243ef0, u, base 0x00007fcf28400000, level lv00
(524288), used: 132162, committed: 139264, committed-free: 7102
thanks,
--lx
On 3/30/21 11:18 AM, Volker Simonis wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Hi,
>
> I'd like to (re)start a discussion on asynchronous logging [1,2,3,4].
> We are successfully using this feature productively at Amazon both in
> jdk8 and jdk11 to reduce the tail latency of services which use
> logging. We think that async logging is a useful addition to the
> current logging framework which might be beneficial to a wider range
> of users. The following write-up tries to capture the comments and
> suggestions from the previous discussions we are aware of.
>
> Current state:
>
> - HotSpot uses the so called "Unified Logging" (UL) framework which
> was introduced by JEP 158 [5] in JDK 9. Most logs have been
> retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
> [6]).
> - The current UL implementation is based on the standard C buffered
> stream I/O interface [7]. The LogFileStreamOutput class which writes
> logs to abstract FILE streams is the only child of the abstract base
> class LogOutput. LogFileStreamOutput has three child classes
> LogStdoutOutput, LogStderrOutput and LogFileOutput which write to
> stdout, stderr or an arbitrary file respectively. The initial UL JEP
> 158 [5] envisioned logging to a socket but has not implemented it. At
> least one such extension has been proposed since then [8].
> - UL synchronizes logs from different threads with the help of the
> standard C flockfile()/funlockfile() [9] functions. But even without
> this explicit locking, all the "stdio functions are thread-safe. This
> is achieved by assigning to each FILE object a lockcount and (if the
> lockcount is nonzero) an owning thread. For each library call, these
> functions wait until the FILE object is no longer locked by a
> different thread, then lock it, do the requested I/O, and unlock the
> object again" [9]. A quick look at the glibc sources reveals that FILE
> locking is implemented with the help of futex() [10] which breaks down
> to s simple atomic compare and swap (CAS) on the fast path.
> - Notice that UL "synchronizes" logs from different threads to avoid
> log interleaving. But it does not "serialize" logs according to the
> time at which they occurred. This is because the underlying stdio
> functions do not guarantee a specific order for different threads
> waiting on a locked FILE stream. E.g. if three log events A, B, C
> occur in that order, the first will lock the output stream. If the log
> events B and C both arrive while the stream is locked, it is
> unspecified which of B and C will be logged first after A releases the
> lock.
>
> Problem statement:
>
> - The amount of time a log event will block its FILE stream depends on
> the underlying file system. This can range from a few nanoseconds for
> in-memory file systems or milliseconds for physical discs under heavy
> load up to several seconds in the worst case scenario for e.g. network
> file systems. A blocked log output stream will block all concurrent
> threads which try to write log messages at the same time. If logging
> is done during a safepoint, this can significantly increase the
> safepoint time (e.g. several parallel GC threads trying to log at the
> same time). We can treat stdout/stderr as special files here without
> loss of generality.
>
> Proposed solution:
>
> Extend UL with an asynchronous logging facility. Asynchronous logging
> will be optional and disabled by default. It should have the following
> properties:
> - If disabled (the default) asynchronous logging should have no
> observable impact on logging.
> - If enabled, log messages will be stored in an intermediate data
> structure (e.g. a double ended queue).
> - A service thread will concurrently read and remove log messages from
> that data structure in a FIFO style and write them to the output
> stream
> - Storing log messages in the intermediate data structure should take
> constant time and not longer than logging a message takes in the
> traditional UL system (in general the time should be much shorter
> because the actual I/O is deferred).
> - Asynchronous logging trades memory overhead (i.e. the size of the
> intermediate data structure) for log accuracy. This means that in the
> unlikely case where the service thread which does the asynchronous
> logging falls behind the log producing threads, some logs might be
> lost. However, the probability for this to happen can be minimized by
> increasing the configurable size of the intermediate data structure.
> - The final output produced by asynchronous logging should be equal to
> the output from normal logging if no messages had to be dropped.
> Notice that in contrast to the traditional unified logging,
> asynchronous logging will give us the possibility to not only
> synchronize log events, but to optionally also serialize them based on
> their generation time if that's desired. This is because we are in
> full control of the synchronization primitives for the intermediate
> data structure which stores the logs.
> - If log messages had to be dropped, this should be logged in the log
> output (e.g. "[..] 42 messages dropped due to async logging")
> - Asynchronous logging should ideally be implemented in such a way
> that it can be easily adapted by alternative log targets like for
> example sockets in the future.
>
> Alternative solutions:
> - It has repeatedly been suggested to place the log files into a
> memory file system but we don't think this is an optimal solution.
> Main memory is often a constrained resource and we don't want log
> files to compete with the JVM for it in such cases.
> - It has also been argued to place the log files on a fast file system
> which is only used for logging but in containerized environments file
> system are often virtualized and the properties of the underlying
> physical devices are not obvious.
> - The load on the file system might be unpredictable due to other
> applications on the same host.
> - All these workarounds won't work if we eventually implement direct
> logging to a network socket as suggested in [8].
>
> Implementation details / POC:
>
> - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> simple deque implementation derived from HotSpot's LinkedListImpl
> class for the intermediate data structure. It synchronizes access to
> the queue with a MutexLocker which is internally implemented with
> pthread_lock() and results in an atomic CAS on the fast path. So
> performance-wise the locking itself is not different from the
> flockfile()/funlockfile() functionality currently used by UL but
> adding a log message to the deque should be constant as it basically
> only requires a strdup(). And we could even eliminate the strdup() if
> we'd pre-allocate a big enough array for holding the log messages as
> proposed in the pull request [2].
> - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> async flag as an attribute of the Xlog option which feels more natural
> because UL configuration is traditionally done within the Xlog option.
> But we could just as well use a global -XX flag to enable async
> logging? What are your preferences here?
> - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> WatcherThread as service thread to concurrently process the
> intermediate data structure and write the log messages out to the log
> stream. That should definitely be changed to an independent service
> thread.
> - The pull pull request [2] for JDK-8229517 [3] initially proposed
> that the "service thread" runs at a fixed interval to dump log
> messages to the log streams. But reviewers commented that this should
> better happen either continuously or based on the filling level of the
> intermediate data structure. What are your preferences here?
> - What are your preferences on the configuration of the intermediate
> data structure? Should it be configured based on the maximum number of
> log messages it can store or rather on the total size of the stored
> log messages? I think that for normal runs this distinction probably
> won't make a big difference because the size of log messages will
> probably be the same on average so "number of log messages" should
> always be proportional to "total size of log mesages".
>
> 1. Before diving into more implementation details, I'd first like to
> reach a general consensus that asynchronous logging is a useful
> feature that's worth while adding to HotSpot.
>
> 2. Once we agree on that, we should agree on the desired properties of
> asynchronous logging. I've tried to collect a basic set in the
> "Proposed solution" section.
>
> 3. If that's done as well, we can discuss various implementation
> details and finally prepare new pull requests.
>
> Thank you and best regards,
> Volker
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> [2] https://github.com/openjdk/jdk/pull/3135
> [3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> [4] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> [5] https://openjdk.java.net/jeps/158
> [6] https://openjdk.java.net/jeps/271
> [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> [8] https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> [10] https://man7.org/linux/man-pages/man2/futex.2.html
>
More information about the hotspot-dev
mailing list