Request For Comment: Asynchronous Logging
Liu, Xin
xxinliu at amazon.com
Tue Apr 13 22:12:12 UTC 2021
Hi, Thomas,
I am still working on this feature. I have tabulated the requests from
reviewers. I draw a line of "consensus" and working on them one by one.
-------------------------------------------------------------
| request | Status |
-------------------------------------------------------------
| independent NonJavaThread | done [1] |
|-----------------------------------------------------------|
| pass tier1-test with +AsyncLogging |JDK-8265102 |
|-----------------------------------------------------------<- progress
| inject meta-info to display no. of dropping msg. |WIP |
|-----------------------------------------------------------|
| preserve accurate decorations | |
|-----------------------------------------------------------|
| hoist async_mode to base class for stdout/err| [2] |
|-----------------------------------------------------------|
| CSR |JDK-8264323 |
|-----------------------------------------------------------<- consensus
|use string table over malloc | |
------------------------------------------------------------|
|use lock-free data structure. | |
-------------------------------------------------------------
I would like to mark items below consensus "improvement" and leave them
in separated JBS. I discussed about lock-free algorithms with Robbin on
github. He agree to defer it[3]. I don't like strdup either, but I
don't think it would hurt performance a lot. David Holmes comments and I
quote "If we had had async logging from day one then the way we
construct log messages would have been done differently". In current
codebase, I have to copy logging messages anyway, the real cost should
come from "copying" instead of allocating. Further, I have moved
"allocate" and "free" out of critical areas. If performance does matter,
we will have a overall deign including both sync/async messages and
decorators.
May I call out JDK-8264323? It's a CSR about JVM option. I have written
down pros and cons. Could you take a look? I can support both global
option and finer-grained output-option. I won't increase much complexity.
[1]code is here:
https://github.com/openjdk/jdk/pull/3135/commits/9211121c7afeb7a9fe602fd2a4d20bd98ddb1569
[2] here is the request:
https://github.com/openjdk/jdk/pull/3135#issuecomment-809977841
[3] https://github.com/openjdk/jdk/pull/3135#issuecomment-813880728
On 4/12/21 10:30 PM, Thomas Stüfe wrote:
> *CAUTION*: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
>
> Hi Xin & others,
>
> Sorry for the late response, I had vacation.
>
> As I wrote before, I think this is a valuable improvement. I would be
> unhappy if this were stalled further.
>
> In this thread there seems to be a majority opinion now for
> - having an own thread instead of using the WatcherThread
> - having a global option for async (good point, Ioi)
>
> AFAICS the only POC left is the question whether to make the sizing
> option a memory size or a "number of entries". I argued for the memory
> size before, but won't insist on it, in order to move this issue forward
> (we could always just keep that option diagnostic).
>
> Is there anything else left to discuss?
>
> I mentioned before that I think UL is too complex and could benefit from
> simplification; but this is clearly not Xins fault, and I am sorry that
> I brought this even up here, since it has nothing to do with this RFE.
>
> Cheers, Thomas
>
>
>
> On Tue, Mar 30, 2021 at 11:21 PM Liu, Xin <xxinliu at amazon.com
> <mailto:xxinliu at amazon.com>> wrote:
>
> Thanks Volker for this. I would like to append some additional
> materials. I forgot to mention them when I wrote the Rationale[5]
> yesterday.
>
> We identified and root-caused the tail-latency on a Linux system
> with software RAID in 2018.
> We have different implementations on jdk8u and jdk11u. We are
> seeking to merge this feature to tip.
>
> Nonetheless, it doesn't mean "async-logging facility" only solves
> Amazon's peculiar problem. When we studied this, we found many
> interesting references.
> Eg. LinkedIn reported and analyzed it well[1]. In particular, they
> mentioned that one reason was Linux cache writeback [2]. IMHO, that
> could impact
> almost all mass-storge Linux filesystems. Twitter also expressed
> that "I would love to hear if this can happen with OpenJDK!"[3].
> This is also reported
> by other companies[4].
>
> Thanks,
> --lx
>
>
> [1]
> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
> <https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic>
> [2]
> https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
> <https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html>
> [3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
> <https://www.evanjones.ca/jvm-mmap-pause-finding.html>
> [4]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
> <https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html>
> [5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
> <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
>
> On 3/30/21, 11:20 AM, "Volker Simonis" <volker.simonis at gmail.com
> <mailto:volker.simonis at gmail.com>> wrote:
>
> CAUTION: This email originated from outside of the organization.
> Do not click links or open attachments unless you can confirm the
> sender and know the content is safe.
>
>
>
> Hi,
>
> I'd like to (re)start a discussion on asynchronous logging
> [1,2,3,4].
> We are successfully using this feature productively at Amazon
> both in
> jdk8 and jdk11 to reduce the tail latency of services
which use
> logging. We think that async logging is a useful addition to the
> current logging framework which might be beneficial to a wider range
> of users. The following write-up tries to capture the
comments and
> suggestions from the previous discussions we are aware of.
>
> Current state:
>
> - HotSpot uses the so called "Unified Logging" (UL) framework which
> was introduced by JEP 158 [5] in JDK 9. Most logs have been
> retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
> [6]).
> - The current UL implementation is based on the standard C buffered
> stream I/O interface [7]. The LogFileStreamOutput class which writes
> logs to abstract FILE streams is the only child of the abstract base
> class LogOutput. LogFileStreamOutput has three child classes
> LogStdoutOutput, LogStderrOutput and LogFileOutput which write to
> stdout, stderr or an arbitrary file respectively. The
initial UL JEP
> 158 [5] envisioned logging to a socket but has not implemented
> it. At
> least one such extension has been proposed since then
[8].
> - UL synchronizes logs from different threads with the help of the
> standard C flockfile()/funlockfile() [9] functions. But even without
> this explicit locking, all the "stdio functions are thread-safe.
> This
> is achieved by assigning to each FILE object a lockcount and (if the
> lockcount is nonzero) an owning thread. For each library call,
> these
> functions wait until the FILE object is no longer locked by a
> different thread, then lock it, do the requested I/O,
and unlock the
> object again" [9]. A quick look at the glibc sources reveals
> that FILE
> locking is implemented with the help of futex() [10] which
> breaks down
> to s simple atomic compare and swap (CAS) on the fast
path.
> - Notice that UL "synchronizes" logs from different threads to avoid
> log interleaving. But it does not "serialize" logs according to the
> time at which they occurred. This is because the underlying stdio
> functions do not guarantee a specific order for different threads
> waiting on a locked FILE stream. E.g. if three log events A, B, C
> occur in that order, the first will lock the output stream. If
> the log
> events B and C both arrive while the stream is locked, it is
> unspecified which of B and C will be logged first after A
> releases the
> lock.
>
> Problem statement:
>
> - The amount of time a log event will block its FILE stream
> depends on
> the underlying file system. This can range from a few
> nanoseconds for
> in-memory file systems or milliseconds for physical discs under
> heavy
> load up to several seconds in the worst case scenario
for e.g.
> network
> file systems. A blocked log output stream will block all concurrent
> threads which try to write log messages at the same time. If logging
> is done during a safepoint, this can significantly increase the
> safepoint time (e.g. several parallel GC threads trying to log
> at the
> same time). We can treat stdout/stderr as special files here without
> loss of generality.
>
> Proposed solution:
>
> Extend UL with an asynchronous logging facility. Asynchronous
> logging
> will be optional and disabled by default. It should have the
> following
> properties:
> - If disabled (the default) asynchronous logging should have no
> observable impact on logging.
> - If enabled, log messages will be stored in an intermediate data
> structure (e.g. a double ended queue).
> - A service thread will concurrently read and remove log
> messages from
> that data structure in a FIFO style and write them to
the output
> stream
> - Storing log messages in the intermediate data structure should
> take
> constant time and not longer than logging a message takes in the
> traditional UL system (in general the time should be much shorter
> because the actual I/O is deferred).
> - Asynchronous logging trades memory overhead (i.e. the size of the
> intermediate data structure) for log accuracy. This means that
> in the
> unlikely case where the service thread which does the
asynchronous
> logging falls behind the log producing threads, some logs might be
> lost. However, the probability for this to happen can
be
> minimized by
> increasing the configurable size of the intermediate data structure.
> - The final output produced by asynchronous logging should be
> equal to
> the output from normal logging if no messages had to be dropped.
> Notice that in contrast to the traditional unified logging,
> asynchronous logging will give us the possibility to not only
> synchronize log events, but to optionally also serialize them
> based on
> their generation time if that's desired. This is because we are in
> full control of the synchronization primitives for the intermediate
> data structure which stores the logs.
> - If log messages had to be dropped, this should be logged in
> the log
> output (e.g. "[..] 42 messages dropped due to async logging")
> - Asynchronous logging should ideally be implemented in such a way
> that it can be easily adapted by alternative log targets like for
> example sockets in the future.
>
> Alternative solutions:
> - It has repeatedly been suggested to place the log files into a
> memory file system but we don't think this is an optimal solution.
> Main memory is often a constrained resource and we don't want log
> files to compete with the JVM for it in such cases.
> - It has also been argued to place the log files on a
fast file
> system
> which is only used for logging but in containerized environments
> file
> system are often virtualized and the properties of the underlying
> physical devices are not obvious.
> - The load on the file system might be unpredictable due to other
> applications on the same host.
> - All these workarounds won't work if we eventually implement direct
> logging to a network socket as suggested in [8].
>
> Implementation details / POC:
>
> - A recent pull request [2] for JDK-8229517 [3] proposed to use a
> simple deque implementation derived from HotSpot's LinkedListImpl
> class for the intermediate data structure. It synchronizes access to
> the queue with a MutexLocker which is internally implemented with
> pthread_lock() and results in an atomic CAS on the fast path. So
> performance-wise the locking itself is not different from the
> flockfile()/funlockfile() functionality currently used by UL but
> adding a log message to the deque should be constant as it basically
> only requires a strdup(). And we could even eliminate
the
> strdup() if
> we'd pre-allocate a big enough array for holding the log messages as
> proposed in the pull request [2].
> - The pull pull request [2] for JDK-8229517 [3] proposed to set the
> async flag as an attribute of the Xlog option which feels more
> natural
> because UL configuration is traditionally done within
the Xlog
> option.
> But we could just as well use a global -XX flag to enable async
> logging? What are your preferences here?
> - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
> WatcherThread as service thread to concurrently process the
> intermediate data structure and write the log messages out to
> the log
> stream. That should definitely be changed to an independent service
> thread.
> - The pull pull request [2] for JDK-8229517 [3] initially proposed
> that the "service thread" runs at a fixed interval to
dump log
> messages to the log streams. But reviewers commented that this
> should
> better happen either continuously or based on the filling level
> of the
> intermediate data structure. What are your preferences here?
> - What are your preferences on the configuration of the intermediate
> data structure? Should it be configured based on the maximum
> number of
> log messages it can store or rather on the total size
of the stored
> log messages? I think that for normal runs this distinction probably
> won't make a big difference because the size of log messages will
> probably be the same on average so "number of log messages" should
> always be proportional to "total size of log mesages".
>
> 1. Before diving into more implementation details, I'd first like to
> reach a general consensus that asynchronous logging is a useful
> feature that's worth while adding to HotSpot.
>
> 2. Once we agree on that, we should agree on the desired
> properties of
> asynchronous logging. I've tried to collect a basic set in the
> "Proposed solution" section.
>
> 3. If that's done as well, we can discuss various implementation
> details and finally prepare new pull requests.
>
> Thank you and best regards,
> Volker
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8229517
> <https://bugs.openjdk.java.net/browse/JDK-8229517>
> [2] https://github.com/openjdk/jdk/pull/3135
> <https://github.com/openjdk/jdk/pull/3135>
> [3]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
> <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
> [4]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
> <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
> [5] https://openjdk.java.net/jeps/158
> <https://openjdk.java.net/jeps/158>
> [6] https://openjdk.java.net/jeps/271
> <https://openjdk.java.net/jeps/271>
> [7] https://man7.org/linux/man-pages/man3/stdio.3.html
> <https://man7.org/linux/man-pages/man3/stdio.3.html>
> [8]
> https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
> <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
> [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
> <https://man7.org/linux/man-pages/man3/flockfile.3.html>
> [10] https://man7.org/linux/man-pages/man2/futex.2.html
> <https://man7.org/linux/man-pages/man2/futex.2.html>
>
More information about the hotspot-dev
mailing list