[EXTERNAL] Request For Comment: Asynchronous Logging
Ioi Lam
ioi.lam at oracle.com
Mon Apr 19 17:26:59 UTC 2021
On 4/13/21 3:12 PM, Liu, Xin wrote:
> Hi, Thomas,
>
> I am still working on this feature. I have tabulated the requests from
> reviewers. I draw a line of "consensus" and working on them one by one.
>
> -------------------------------------------------------------
> | request | Status |
> -------------------------------------------------------------
> | independent NonJavaThread | done [1] |
> |-----------------------------------------------------------|
> | pass tier1-test with +AsyncLogging |JDK-8265102 |
> |-----------------------------------------------------------<- progress
> | inject meta-info to display no. of dropping msg. |WIP |
> |-----------------------------------------------------------|
> | preserve accurate decorations | |
> |-----------------------------------------------------------|
> | hoist async_mode to base class for stdout/err| [2] |
> |-----------------------------------------------------------|
> | CSR |JDK-8264323 |
> |-----------------------------------------------------------<- consensus
> |use string table over malloc | |
> ------------------------------------------------------------|
> |use lock-free data structure. | |
> -------------------------------------------------------------
>
>
> I would like to mark items below consensus "improvement" and leave them
> in separated JBS. I discussed about lock-free algorithms with Robbin on
> github. He agree to defer it[3]. I don't like strdup either, but I
> don't think it would hurt performance a lot. David Holmes comments and I
> quote "If we had had async logging from day one then the way we
> construct log messages would have been done differently". In current
> codebase, I have to copy logging messages anyway, the real cost should
> come from "copying" instead of allocating. Further, I have moved
> "allocate" and "free" out of critical areas. If performance does matter,
> we will have a overall deign including both sync/async messages and
> decorators.
>
> May I call out JDK-8264323? It's a CSR about JVM option. I have written
> down pros and cons. Could you take a look? I can support both global
> option and finer-grained output-option. I won't increase much complexity.
If we decide that global option is the way to go, I think we should not
add fine-grained option at the same time. Even if the implementation may
be easy for now, it makes the system more complex and harder to evolve
in the future.
Thanks
- Ioi
> [1]code is here:
> https://github.com/openjdk/jdk/pull/3135/commits/9211121c7afeb7a9fe602fd2a4d20bd98ddb1569
> [2] here is the request:
> https://github.com/openjdk/jdk/pull/3135#issuecomment-809977841
> [3] https://github.com/openjdk/jdk/pull/3135#issuecomment-813880728
>
>
>
>
> On 4/12/21 10:30 PM, Thomas Stüfe wrote:
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>> Hi Xin & others,
>>
>> Sorry for the late response, I had vacation.
>>
>> As I wrote before, I think this is a valuable improvement. I would be
>> unhappy if this were stalled further.
>>
>> In this thread there seems to be a majority opinion now for
>> - having an own thread instead of using the WatcherThread
>> - having a global option for async (good point, Ioi)
>>
>> AFAICS the only POC left is the question whether to make the sizing
>> option a memory size or a "number of entries". I argued for the memory
>> size before, but won't insist on it, in order to move this issue forward
>> (we could always just keep that option diagnostic).
>>
>> Is there anything else left to discuss?
>>
>> I mentioned before that I think UL is too complex and could benefit from
>> simplification; but this is clearly not Xins fault, and I am sorry that
>> I brought this even up here, since it has nothing to do with this RFE.
>>
>> Cheers, Thomas
>>
>>
>>
>> On Tue, Mar 30, 2021 at 11:21 PM Liu, Xin <xxinliu at amazon.com
>> <mailto:xxinliu at amazon.com>> wrote:
>>
>> Thanks Volker for this. I would like to append some additional
>> materials. I forgot to mention them when I wrote the Rationale[5]
>> yesterday.
>>
>> We identified and root-caused the tail-latency on a Linux system
>> with software RAID in 2018.
>> We have different implementations on jdk8u and jdk11u. We are
>> seeking to merge this feature to tip.
>>
>> Nonetheless, it doesn't mean "async-logging facility" only solves
>> Amazon's peculiar problem. When we studied this, we found many
>> interesting references.
>> Eg. LinkedIn reported and analyzed it well[1]. In particular, they
>> mentioned that one reason was Linux cache writeback [2]. IMHO, that
>> could impact
>> almost all mass-storge Linux filesystems. Twitter also expressed
>> that "I would love to hear if this can happen with OpenJDK!"[3].
>> This is also reported
>> by other companies[4].
>>
>> Thanks,
>> --lx
>>
>>
>> [1]
>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>> <https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic>
>> [2]
>> https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
>> <https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html>
>> [3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
>> <https://www.evanjones.ca/jvm-mmap-pause-finding.html>
>> [4]
>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
>> <https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html>
>> [5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
>> <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
>>
>> On 3/30/21, 11:20 AM, "Volker Simonis" <volker.simonis at gmail.com
>> <mailto:volker.simonis at gmail.com>> wrote:
>>
>> CAUTION: This email originated from outside of the organization.
>> Do not click links or open attachments unless you can confirm the
>> sender and know the content is safe.
>>
>>
>>
>> Hi,
>>
>> I'd like to (re)start a discussion on asynchronous logging
>> [1,2,3,4].
>> We are successfully using this feature productively at Amazon
>> both in
>> jdk8 and jdk11 to reduce the tail latency of services
> which use
>> logging. We think that async logging is a useful addition to the
>> current logging framework which might be beneficial to a wider range
>> of users. The following write-up tries to capture the
> comments and
>> suggestions from the previous discussions we are aware of.
>>
>> Current state:
>>
>> - HotSpot uses the so called "Unified Logging" (UL) framework which
>> was introduced by JEP 158 [5] in JDK 9. Most logs have been
>> retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
>> [6]).
>> - The current UL implementation is based on the standard C buffered
>> stream I/O interface [7]. The LogFileStreamOutput class which writes
>> logs to abstract FILE streams is the only child of the abstract base
>> class LogOutput. LogFileStreamOutput has three child classes
>> LogStdoutOutput, LogStderrOutput and LogFileOutput which write to
>> stdout, stderr or an arbitrary file respectively. The
> initial UL JEP
>> 158 [5] envisioned logging to a socket but has not implemented
>> it. At
>> least one such extension has been proposed since then
> [8].
>> - UL synchronizes logs from different threads with the help of the
>> standard C flockfile()/funlockfile() [9] functions. But even without
>> this explicit locking, all the "stdio functions are thread-safe.
>> This
>> is achieved by assigning to each FILE object a lockcount and (if the
>> lockcount is nonzero) an owning thread. For each library call,
>> these
>> functions wait until the FILE object is no longer locked by a
>> different thread, then lock it, do the requested I/O,
> and unlock the
>> object again" [9]. A quick look at the glibc sources reveals
>> that FILE
>> locking is implemented with the help of futex() [10] which
>> breaks down
>> to s simple atomic compare and swap (CAS) on the fast
> path.
>> - Notice that UL "synchronizes" logs from different threads to avoid
>> log interleaving. But it does not "serialize" logs according to the
>> time at which they occurred. This is because the underlying stdio
>> functions do not guarantee a specific order for different threads
>> waiting on a locked FILE stream. E.g. if three log events A, B, C
>> occur in that order, the first will lock the output stream. If
>> the log
>> events B and C both arrive while the stream is locked, it is
>> unspecified which of B and C will be logged first after A
>> releases the
>> lock.
>>
>> Problem statement:
>>
>> - The amount of time a log event will block its FILE stream
>> depends on
>> the underlying file system. This can range from a few
>> nanoseconds for
>> in-memory file systems or milliseconds for physical discs under
>> heavy
>> load up to several seconds in the worst case scenario
> for e.g.
>> network
>> file systems. A blocked log output stream will block all concurrent
>> threads which try to write log messages at the same time. If logging
>> is done during a safepoint, this can significantly increase the
>> safepoint time (e.g. several parallel GC threads trying to log
>> at the
>> same time). We can treat stdout/stderr as special files here without
>> loss of generality.
>>
>> Proposed solution:
>>
>> Extend UL with an asynchronous logging facility. Asynchronous
>> logging
>> will be optional and disabled by default. It should have the
>> following
>> properties:
>> - If disabled (the default) asynchronous logging should have no
>> observable impact on logging.
>> - If enabled, log messages will be stored in an intermediate data
>> structure (e.g. a double ended queue).
>> - A service thread will concurrently read and remove log
>> messages from
>> that data structure in a FIFO style and write them to
> the output
>> stream
>> - Storing log messages in the intermediate data structure should
>> take
>> constant time and not longer than logging a message takes in the
>> traditional UL system (in general the time should be much shorter
>> because the actual I/O is deferred).
>> - Asynchronous logging trades memory overhead (i.e. the size of the
>> intermediate data structure) for log accuracy. This means that
>> in the
>> unlikely case where the service thread which does the
> asynchronous
>> logging falls behind the log producing threads, some logs might be
>> lost. However, the probability for this to happen can
> be
>> minimized by
>> increasing the configurable size of the intermediate data structure.
>> - The final output produced by asynchronous logging should be
>> equal to
>> the output from normal logging if no messages had to be dropped.
>> Notice that in contrast to the traditional unified logging,
>> asynchronous logging will give us the possibility to not only
>> synchronize log events, but to optionally also serialize them
>> based on
>> their generation time if that's desired. This is because we are in
>> full control of the synchronization primitives for the intermediate
>> data structure which stores the logs.
>> - If log messages had to be dropped, this should be logged in
>> the log
>> output (e.g. "[..] 42 messages dropped due to async logging")
>> - Asynchronous logging should ideally be implemented in such a way
>> that it can be easily adapted by alternative log targets like for
>> example sockets in the future.
>>
>> Alternative solutions:
>> - It has repeatedly been suggested to place the log files into a
>> memory file system but we don't think this is an optimal solution.
>> Main memory is often a constrained resource and we don't want log
>> files to compete with the JVM for it in such cases.
>> - It has also been argued to place the log files on a
> fast file
>> system
>> which is only used for logging but in containerized environments
>> file
>> system are often virtualized and the properties of the underlying
>> physical devices are not obvious.
>> - The load on the file system might be unpredictable due to other
>> applications on the same host.
>> - All these workarounds won't work if we eventually implement direct
>> logging to a network socket as suggested in [8].
>>
>> Implementation details / POC:
>>
>> - A recent pull request [2] for JDK-8229517 [3] proposed to use a
>> simple deque implementation derived from HotSpot's LinkedListImpl
>> class for the intermediate data structure. It synchronizes access to
>> the queue with a MutexLocker which is internally implemented with
>> pthread_lock() and results in an atomic CAS on the fast path. So
>> performance-wise the locking itself is not different from the
>> flockfile()/funlockfile() functionality currently used by UL but
>> adding a log message to the deque should be constant as it basically
>> only requires a strdup(). And we could even eliminate
> the
>> strdup() if
>> we'd pre-allocate a big enough array for holding the log messages as
>> proposed in the pull request [2].
>> - The pull pull request [2] for JDK-8229517 [3] proposed to set the
>> async flag as an attribute of the Xlog option which feels more
>> natural
>> because UL configuration is traditionally done within
> the Xlog
>> option.
>> But we could just as well use a global -XX flag to enable async
>> logging? What are your preferences here?
>> - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
>> WatcherThread as service thread to concurrently process the
>> intermediate data structure and write the log messages out to
>> the log
>> stream. That should definitely be changed to an independent service
>> thread.
>> - The pull pull request [2] for JDK-8229517 [3] initially proposed
>> that the "service thread" runs at a fixed interval to
> dump log
>> messages to the log streams. But reviewers commented that this
>> should
>> better happen either continuously or based on the filling level
>> of the
>> intermediate data structure. What are your preferences here?
>> - What are your preferences on the configuration of the intermediate
>> data structure? Should it be configured based on the maximum
>> number of
>> log messages it can store or rather on the total size
> of the stored
>> log messages? I think that for normal runs this distinction probably
>> won't make a big difference because the size of log messages will
>> probably be the same on average so "number of log messages" should
>> always be proportional to "total size of log mesages".
>>
>> 1. Before diving into more implementation details, I'd first like to
>> reach a general consensus that asynchronous logging is a useful
>> feature that's worth while adding to HotSpot.
>>
>> 2. Once we agree on that, we should agree on the desired
>> properties of
>> asynchronous logging. I've tried to collect a basic set in the
>> "Proposed solution" section.
>>
>> 3. If that's done as well, we can discuss various implementation
>> details and finally prepare new pull requests.
>>
>> Thank you and best regards,
>> Volker
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8229517
>> <https://bugs.openjdk.java.net/browse/JDK-8229517>
>> [2] https://github.com/openjdk/jdk/pull/3135
>> <https://github.com/openjdk/jdk/pull/3135>
>> [3]
>> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
>> <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
>> [4]
>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
>> <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
>> [5] https://openjdk.java.net/jeps/158
>> <https://openjdk.java.net/jeps/158>
>> [6] https://openjdk.java.net/jeps/271
>> <https://openjdk.java.net/jeps/271>
>> [7] https://man7.org/linux/man-pages/man3/stdio.3.html
>> <https://man7.org/linux/man-pages/man3/stdio.3.html>
>> [8]
>> https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
>> <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
>> [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
>> <https://man7.org/linux/man-pages/man3/flockfile.3.html>
>> [10] https://man7.org/linux/man-pages/man2/futex.2.html
>> <https://man7.org/linux/man-pages/man2/futex.2.html>
>>
>
More information about the hotspot-dev
mailing list