[EXTERNAL] Request For Comment: Asynchronous Logging

Mon Apr 19 17:26:59 UTC 2021

On 4/13/21 3:12 PM, Liu, Xin wrote:
> Hi, Thomas,
>
> I am still working on this feature. I have tabulated the requests from
> reviewers. I draw a line of "consensus" and working on them one by one.
>
> -------------------------------------------------------------
> | request                                      | Status     |
> -------------------------------------------------------------
> | independent NonJavaThread                    | done [1]   |
> |-----------------------------------------------------------|
> | pass tier1-test with +AsyncLogging           |JDK-8265102 |
> |-----------------------------------------------------------<- progress
> | inject meta-info to display no. of dropping msg. |WIP     |
> |-----------------------------------------------------------|
> | preserve accurate decorations                |            |
> |-----------------------------------------------------------|
> | hoist async_mode to base class for stdout/err| [2]        |
> |-----------------------------------------------------------|
> | CSR                                          |JDK-8264323 |
> |-----------------------------------------------------------<- consensus
> |use string table over malloc                  |            |
> ------------------------------------------------------------|
> |use lock-free data structure.                 |            |
> -------------------------------------------------------------
>
>
> I would like to mark items below consensus "improvement" and leave them
> in separated JBS. I discussed about lock-free algorithms with Robbin on
> github. He agree to defer it[3].  I don't like strdup either, but I
> don't think it would hurt performance a lot. David Holmes comments and I
> quote "If we had had async logging from day one then the way we
> construct log messages would have been done differently". In current
> codebase, I have to copy logging messages anyway, the real cost should
> come from "copying" instead of allocating. Further, I have moved
> "allocate" and "free" out of critical areas. If performance does matter,
> we will have a overall deign including both sync/async messages and
> decorators.
>
> May I call out JDK-8264323? It's a CSR about JVM option. I have written
> down pros and cons. Could you take a look? I can support both global
> option and finer-grained output-option.  I won't increase much complexity.

If we decide that global option is the way to go, I think we should not 
add fine-grained option at the same time. Even if the implementation may 
be easy for now, it makes the system more complex and harder to evolve 
in the future.

Thanks
- Ioi

> [1]code is here:
> https://github.com/openjdk/jdk/pull/3135/commits/9211121c7afeb7a9fe602fd2a4d20bd98ddb1569
> [2] here is the request:
> https://github.com/openjdk/jdk/pull/3135#issuecomment-809977841
> [3] https://github.com/openjdk/jdk/pull/3135#issuecomment-813880728
>
>
>
>
> On 4/12/21 10:30 PM, Thomas Stüfe wrote:
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>> Hi Xin & others,
>>
>> Sorry for the late response, I had vacation.
>>
>> As I wrote before, I think this is a valuable improvement. I would be
>> unhappy if this were stalled further.
>>
>> In this thread there seems to be a majority opinion now for
>> - having an own thread instead of using the WatcherThread
>> - having a global option for async (good point, Ioi)
>>
>> AFAICS the only POC left is the question whether to make the sizing
>> option a memory size or a "number of entries". I argued for the memory
>> size before, but won't insist on it, in order to move this issue forward
>> (we could always just keep that option diagnostic).
>>
>> Is there anything else left to discuss?
>>
>> I mentioned before that I think UL is too complex and could benefit from
>> simplification; but this is clearly not Xins fault, and I am sorry that
>> I brought this even up here, since it has nothing to do with this RFE.
>>
>> Cheers, Thomas
>>
>>
>>
>> On Tue, Mar 30, 2021 at 11:21 PM Liu, Xin <xxinliu at amazon.com
>> <mailto:xxinliu at amazon.com>> wrote:
>>
>>      Thanks Volker for this. I would like to append some additional
>>      materials.  I forgot to mention them when I wrote the Rationale[5]
>>      yesterday.
>>
>>      We identified and root-caused the tail-latency on a Linux system
>>      with software RAID in 2018.
>>      We have different implementations on jdk8u and jdk11u. We are
>>      seeking to merge this feature to tip.
>>
>>      Nonetheless, it doesn't mean "async-logging facility" only solves
>>      Amazon's peculiar problem.  When we studied this, we found many
>>      interesting references.
>>      Eg. LinkedIn reported and analyzed it well[1]. In particular, they
>>      mentioned that one reason was Linux cache writeback [2]. IMHO, that
>>      could impact
>>      almost all mass-storge Linux filesystems. Twitter also expressed
>>      that "I would love to hear if this can happen with OpenJDK!"[3].
>>      This is also reported
>>      by other companies[4].
>>
>>      Thanks,
>>      --lx
>>
>>
>>      [1]
>>      https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>      <https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic>
>>      [2]
>>      https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html
>>      <https://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html>
>>      [3] https://www.evanjones.ca/jvm-mmap-pause-finding.html
>>      <https://www.evanjones.ca/jvm-mmap-pause-finding.html>
>>      [4]
>>      https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html
>>      <https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-June/042301.html>
>>      [5] https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487
>>      <https://github.com/openjdk/jdk/pull/3135#issuecomment-809942487>
>>
>>      On 3/30/21, 11:20 AM, "Volker Simonis" <volker.simonis at gmail.com
>>      <mailto:volker.simonis at gmail.com>> wrote:
>>
>>          CAUTION: This email originated from outside of the organization.
>>      Do not click links or open attachments unless you can confirm the
>>      sender and know the content is safe.
>>
>>
>>
>>          Hi,
>>
>>          I'd like to (re)start a discussion on asynchronous logging
>>      [1,2,3,4].
>>          We are successfully using this feature productively at Amazon
>>      both in
>>          jdk8 and jdk11 to reduce the tail latency of services
> which use
>>          logging. We think that async logging is a useful addition to the
>>          current logging framework which might be beneficial to a wider range
>>          of users. The following write-up tries to capture the
> comments and
>>          suggestions from the previous discussions we are aware of.
>>
>>          Current state:
>>
>>          - HotSpot uses the so called "Unified Logging" (UL) framework which
>>          was introduced by JEP 158 [5] in JDK 9. Most logs have been
>>          retrofitted to use UL since then (e.g. "JEP 271: Unified GC Logging"
>>          [6]).
>>          - The current UL implementation is based on the standard C buffered
>>          stream I/O interface [7]. The LogFileStreamOutput class which writes
>>          logs to abstract FILE streams is the only child of the abstract base
>>          class LogOutput. LogFileStreamOutput has three child classes
>>          LogStdoutOutput,  LogStderrOutput and LogFileOutput which write to
>>          stdout, stderr or an arbitrary file respectively. The
> initial UL JEP
>>          158 [5] envisioned logging to a socket but has not implemented
>>      it. At
>>          least one such extension has been proposed since then
> [8].
>>          - UL synchronizes logs from different threads with the help of the
>>          standard C flockfile()/funlockfile() [9] functions. But even without
>>          this explicit locking, all the "stdio functions are thread-safe.
>>      This
>>          is achieved by assigning to each FILE object a lockcount and (if the
>>          lockcount is nonzero) an owning thread.  For each library call,
>>      these
>>          functions wait until the FILE object is no longer locked by a
>>          different thread, then lock it, do the requested I/O,
> and unlock the
>>          object again" [9]. A quick look at the glibc sources reveals
>>      that FILE
>>          locking is implemented with the help of futex() [10] which
>>      breaks down
>>          to s simple atomic compare and swap (CAS) on the fast
> path.
>>          - Notice that UL "synchronizes" logs from different threads to avoid
>>          log interleaving. But it does not "serialize" logs according to the
>>          time at which  they occurred. This is because the underlying stdio
>>          functions do not guarantee a specific order for different threads
>>          waiting on a locked FILE stream. E.g. if three log events A, B, C
>>          occur in that order, the first will lock the output stream. If
>>      the log
>>          events B and C both arrive while the stream is locked, it is
>>          unspecified which of B and C will be logged first after A
>>      releases the
>>          lock.
>>
>>          Problem statement:
>>
>>          - The amount of time a log event will block its FILE stream
>>      depends on
>>          the underlying file system. This can range from a few
>>      nanoseconds for
>>          in-memory file systems or milliseconds for physical discs under
>>      heavy
>>          load up to several seconds in the worst case scenario
> for e.g.
>>      network
>>          file systems. A blocked log output stream will block all concurrent
>>          threads which try to write log messages at the same time. If logging
>>          is done during a safepoint, this can significantly increase the
>>          safepoint time (e.g. several parallel GC threads trying to log
>>      at the
>>          same time). We can treat stdout/stderr as special files here without
>>          loss of generality.
>>
>>          Proposed solution:
>>
>>          Extend UL with an asynchronous logging facility. Asynchronous
>>      logging
>>          will be optional and disabled by default. It should have the
>>      following
>>          properties:
>>          - If disabled (the default) asynchronous logging should have no
>>          observable impact on logging.
>>          - If enabled, log messages will be stored in an intermediate data
>>          structure (e.g. a double ended queue).
>>          - A service thread will concurrently read and remove log
>>      messages from
>>          that data structure in a FIFO style and write them to
> the output
>>          stream
>>          - Storing log messages in the intermediate data structure should
>>      take
>>          constant time and not longer than logging a message takes in the
>>          traditional UL system (in general the time should be much shorter
>>          because the actual I/O is deferred).
>>          - Asynchronous logging trades memory overhead (i.e. the size of the
>>          intermediate data structure) for log accuracy. This means that
>>      in the
>>          unlikely case where the service thread which does the
> asynchronous
>>          logging falls behind the log producing threads, some logs might be
>>          lost. However, the probability for this to happen can
> be
>>      minimized by
>>          increasing the configurable size of the intermediate data structure.
>>          - The final output produced by asynchronous logging should be
>>      equal to
>>          the output from normal logging if no messages had to be dropped.
>>          Notice that in contrast to the traditional unified logging,
>>          asynchronous logging will give us the possibility to not only
>>          synchronize log events, but to optionally also serialize them
>>      based on
>>          their generation time if that's desired. This is because we are in
>>          full control of the synchronization primitives for the intermediate
>>          data structure which stores the logs.
>>          - If log messages had to be dropped, this should be logged in
>>      the log
>>          output (e.g. "[..] 42 messages dropped due to async logging")
>>          - Asynchronous logging should ideally be implemented in such a way
>>          that it can be easily adapted by alternative log targets like for
>>          example sockets in the future.
>>
>>          Alternative solutions:
>>          - It has repeatedly been suggested to place the log files into a
>>          memory file system but we don't think this is an optimal solution.
>>          Main memory is often a constrained resource and we don't want log
>>          files to compete with the JVM for it in such cases.
>>          - It has also been argued to place the log files on a
> fast file
>>      system
>>          which is only used for logging but in containerized environments
>>      file
>>          system are often virtualized and the properties of the underlying
>>          physical devices are not obvious.
>>          - The load on the file system might be unpredictable due to other
>>          applications on the same host.
>>          - All these workarounds won't work if we eventually implement direct
>>          logging to a network socket as suggested in [8].
>>
>>          Implementation details / POC:
>>
>>          - A recent pull request [2] for JDK-8229517 [3] proposed to use a
>>          simple deque implementation derived from HotSpot's LinkedListImpl
>>          class for the intermediate data structure. It synchronizes access to
>>          the queue with a MutexLocker which is internally implemented with
>>          pthread_lock() and results in an atomic CAS on the fast path. So
>>          performance-wise the locking itself is not different from the
>>          flockfile()/funlockfile() functionality currently used by UL but
>>          adding a log message to the deque should be constant as it basically
>>          only requires a strdup(). And we could even eliminate
> the
>>      strdup() if
>>          we'd pre-allocate a big enough array for holding the log messages as
>>          proposed in the pull request [2].
>>          - The pull pull request [2] for JDK-8229517 [3] proposed to set the
>>          async flag as an attribute of the Xlog option which feels more
>>      natural
>>          because UL configuration is traditionally done within
> the Xlog
>>      option.
>>          But we could just as well use a global -XX flag to enable async
>>          logging? What are your preferences here?
>>          - The pull pull request [2] for JDK-8229517 [3] (mis)uses the
>>          WatcherThread as service thread to concurrently process the
>>          intermediate data structure and write the log messages out to
>>      the log
>>          stream. That should definitely be changed to an independent service
>>          thread.
>>          - The pull pull request [2] for JDK-8229517 [3] initially proposed
>>          that the "service thread" runs at a fixed interval to
> dump log
>>          messages to the log streams. But reviewers commented that this
>>      should
>>          better happen either continuously or based on the filling level
>>      of the
>>          intermediate data structure. What are your preferences here?
>>          - What are your preferences on the configuration of the intermediate
>>          data structure? Should it be configured based on the maximum
>>      number of
>>          log messages it can store or rather on the total size
> of the stored
>>          log messages? I think that for normal runs this distinction probably
>>          won't make a big difference because the size of log messages will
>>          probably be the same on average so "number of log messages" should
>>          always be proportional to "total size of log mesages".
>>
>>          1. Before diving into more implementation details, I'd first like to
>>          reach a general consensus that asynchronous logging is a useful
>>          feature that's worth while adding to HotSpot.
>>
>>          2. Once we agree on that, we should agree on the desired
>>      properties of
>>          asynchronous logging. I've tried to collect a basic set in the
>>          "Proposed solution" section.
>>
>>          3. If that's done as well, we can discuss various implementation
>>          details and finally prepare new pull requests.
>>
>>          Thank you and best regards,
>>          Volker
>>
>>          [1] https://bugs.openjdk.java.net/browse/JDK-8229517
>>      <https://bugs.openjdk.java.net/browse/JDK-8229517>
>>          [2] https://github.com/openjdk/jdk/pull/3135
>>      <https://github.com/openjdk/jdk/pull/3135>
>>          [3]
>>      https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html
>>      <https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043427.html>
>>          [4]
>>      https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html
>>      <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-August/039130.html>
>>          [5] https://openjdk.java.net/jeps/158
>>      <https://openjdk.java.net/jeps/158>
>>          [6] https://openjdk.java.net/jeps/271
>>      <https://openjdk.java.net/jeps/271>
>>          [7] https://man7.org/linux/man-pages/man3/stdio.3.html
>>      <https://man7.org/linux/man-pages/man3/stdio.3.html>
>>          [8]
>>      https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d
>>      <https://gist.github.com/YaSuenag/dacb6d94d8684915422232c7a08d5b5d>
>>          [9] https://man7.org/linux/man-pages/man3/flockfile.3.html
>>      <https://man7.org/linux/man-pages/man3/flockfile.3.html>
>>          [10] https://man7.org/linux/man-pages/man2/futex.2.html
>>      <https://man7.org/linux/man-pages/man2/futex.2.html>
>>
>