JDK-8229517: Buffered Logging?

Liu, Xin xxinliu at amazon.com
Mon Feb 24 07:33:50 UTC 2020


Ramki & David, 

I worked the idea on the tip of jdk. I hope we can solicit more feedback.  Vote and code review are both welcome. 
  
On our side, we’d like to contribute this feature. It's because we observe painful long-tail GC pause-time spikes in our important services if they enable verbose GC logs. Some teams reported us when they migrate from CMS to G1. My observation is that G1GC emits logs at safepoint. If filesystem 'write' blocks and that event happen to intersect the safepoint, collection time of young-gen of G1 may take about 200ms~800ms. I believe there're other similar cases.  It's like 'printk' in the interrupt handlers of Linux kernel.
 
I understand that different companies have different patch policies. It may take Ramki more time to share their internal patches, so I decide to rework on the tip of jdk.   
The basic idea still derives from my previous design. A watcher thread which flushes logs in the background.  A mutex-protected FIFO stores all asynchronous log messages. 

In this implementation, I decide to use dynamic memory. ie. It doesn't drop any message.  It's because I observe that asynchronous task is so fast that it doesn't need much memory to buffer all messages in the interval.  I extent arena support for LogMessageBuffer. It's should be cheaper than c-heap because we drop the arena of the buffered messages right after they flush to underlying logger.

The 2nd major decision is be compatible with existing unified logging interface.  I provide a new header file: <asynclog.hpp> on a par with <log.hpp>.
It can drop-in substitute common uses of logging with asynchronous logging.  In my JDK8u implementation, I redirect all gclog to the asynchronous buffer. With <asynclog.hpp>, we can control unified logs in fine-grained way and can be used in other logging places.  Eg. I successfully substitute all logs of G1GC at safepoint and in concurrent GCThreads because those operations are very time-sensitive. 
Another bold attempt is changing all logs to the asynchronous buffer. It's not a good idea but it shows that the compatibility (https://cr.openjdk.java.net/~xliu/8229517/change_to_asynclog_all/webrev/)
 
Third change: More unittests. Some are in JDK-8239066.

Could reviewers take a look at the new buffered logging implementation?
Bug: https://bugs.openjdk.java.net/browse/JDK-8229517
Webrev: https://cr.openjdk.java.net/~xliu/8229517/01/webrev/
Test:   Passed test-tier1 and gtest:all.

Thanks, 
--lx

  

On 1/30/20, 2:29 PM, "David Holmes" <david.holmes at oracle.com> wrote:

    Hi,
    
    On 26/01/2020 9:37 am, Liu, Xin wrote:
    > Dear hotspot developers,
    >      
    >      I think runtime might be a better forum to continue this discussion.
    >      The JBS: https://bugs.openjdk.java.net/browse/JDK-8229517
    >      
    >      Now our latest patch is here: https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/
    >      I know it’s based on jdk8u. I really appreciate if someone reviews it and gives me advice.  If the community is still interested, we can keep developing using better approach or on the tip of jdk.
    
    When this was previously raised by Ramki it was stated that the 8u 
    solution had already been ported to Unified Logging in JDK 11. That is 
    the proposal that we would need to see to consider inclusion of this 
    feature in mainline JDK.
    
    This proposal seems to be trying to solve a very specific problem in a 
    very specific context. It is hard for me to see how this may work as a 
    general facility or all the potential bad interactions that may occur. 
    It would need to be well isolated so that if you don't ask for it it has 
    zero affect. It would also have to be disabled by default.
    
    I cannot really speak to the merits of the proposal in general as I 
    don't work in such environments. Ramki originally solicited feedback to 
    see if there "was an appetite for this in openjdk". I'm unclear if that 
    feedback was obtained elsewhere or whether this is rebooting the 
    feedback process?
    
    Thanks,
    David
    -----
    
    >      I’d like to present our experiments here. I try to post data on the JDK-8229517, but It doesn't look good in comment area.
    >      
    >      Experiments
    >      We can’t reproduce the problem in [1] because harddisks are not accessible for us. To simulate IO blocking in JVM, we introduce a bogus delay. It spins DelayInGCLogging milliseconds for every fwrite.
    >        notproduct(intx, DelayInGCLogging, 0,
    >        "Delay milliseconds to simulate IO blocking for GC log testing")
    >        
    >      It spins in gcLogFileStream::write_blocking. https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/src/share/vm/utilities/ostream.cpp.cdiff.html
    >      
    >      We observe GC pause time using -XX:+PrintGCApplicationStoppedTime for JavaWorkload with 2g heap. The unit is second.  The program is on github. https://github.com/zhenyun/JavaGCworkload.
    >      $java -Xmx2g -Xms2g -Xloggc:gc.log -XX:+UseG1GC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+UnlockExperimentalVMOptions Test
    >      sync means jdk8u-dev. async means jdk8u-dev with nonblocking gclog change. delayXXX denotes an artifactual delay given by -XX:DelayInGCLogging=XXX.
    >      
    >                                   median     p95         p99           p100        losses
    >      -------------------------------------------------------------------
    >      async                   0.01357   0.36187  0.48691   0.61792
    >      sync                     0.01339   0.3618    0.48646   0.63539
    >      -------------------------------------------------------------------
    >      delay5ms-async 0.01516  0.40564  0.486        0.61053  148156
    >      delay5ms-sync   0.20444  0.4428    0.56733    0.69522
    >      -------------------------------------------------------------------
    >      delay10ms-async0.0123   0.47451  0.48623    0.61083  195460
    >      delay10ms-sync  0.38732 0.52424  0.64783    0.78087
    >      -------------------------------------------------------------------
    >      
    >      async and sync perform equally in happy case. JavaWorkload didn’t lose any GC log even though the GC activities are intensive and the outputs are verbose. If we introduce 5 ms delay for each file-writing, delay5ms-sync starts to show longer stop time for p99 and p100. eg. p99 increases from 0.486s to 0.567s, which is 16,7% longer. delay5ms-async remains the same p99 and p100 at expense of losing 148k logging events. Because the outputs GC logs are contiguous, they are still readable and helpful for problem analysis. After we increase delay to 10ms, delay10ms-async still have the same p99 and p100 stop time while delay10ms-sync has 22.9% longer stop time.
    >      
    >      [1]Eliminating Large JVM GC Pauses Caused by Background IO Traffic
    >      https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
    >      
    >      Thanks,
    >      --lx
    > 
    >     
    >      
    >      
    >      
    >      From: Ramki Ramakrishna <rramakrishna at twitter.com>
    >      Date: Tuesday, August 13, 2019 at 4:16 PM
    >      To: "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
    >      Cc: "Liu, Xin" <xxinliu at amazon.com>, "Mathiske, Bernd" <mathiske at amazon.com>, "Hohensee, Paul" <hohensee at amazon.com>, John Coomes <jcoomes at twitter.com>, Tony Printezis <tprintezis at twitter.com>
    >      Subject: Re: Buffered Logging?
    >      
    >      
    >      I filed https://bugs.openjdk.java.net/browse/JDK-8229517.  Please correct/complete if I mangled or neglected to fill any specific required fields.
    >      
    >      Patches forthcoming in the next few days.
    >      -- ramki
    >      
    >      On Tue, Aug 13, 2019 at 2:53 PM Ramki Ramakrishna <mailto:rramakrishna at twitter.com> wrote:
    >      
    >      I'll go ahead and open a JBS ticket to keep the ball rolling on this one, and so we have a place to hang some patches and discussions off of. Scream if I shouldn't.
    >      
    >      -- ramki
    >      
    >      On Mon, Aug 12, 2019 at 6:19 PM Ramki Ramakrishna <mailto:rramakrishna at twitter.com> wrote:
    >      
    >      (Resent... Apologies to anyone who might see duplicates. There was an issue with my membership in the list which, I am guessing, has now been resolved.)
    >      
    >      ---------- Forwarded message ----------
    >      From: Ramki Ramakrishna <mailto:rramakrishna at twitter.com>
    >      To: mailto:hotspot-dev at openjdk.java.net
    >      Cc: "Liu, Xin" <mailto:xxinliu at amazon.com>, "Mathiske, Bernd" <mailto:mathiske at amazon.com>, "Hohensee, Paul" <mailto:hohensee at amazon.com>, John Coomes <mailto:jcoomes at twitter.com>, Tony Printezis <mailto:tprintezis at twitter.com>
    >      Bcc:
    >      Date: Mon, 12 Aug 2019 15:48:17 -0700
    >      Subject: Buffered Logging?
    >      
    >      In the JDK 7 time-frame, some folks noticed much longer than expected pauses that seemed to coincide with GC logging in the midst of a GC safepoint. In that setup, the GC logs were going to a disk file (these were often useful for post-mortem analyses) rather than to a RAM-based tmpfs which had been the original design center assumption. The vicissitudes of the dirty page flushing policy in Linux when
    >      IO load on the machine (not necessarily the JVM process doing the logging)
    >      could affect the length and duration of these inline logging stalls.
    >      
    >      A buffered logging scheme was then implemented by us (and independently by
    >      others) which we have used successfully to date to avoid these pauses in high i/o
    >      multi-tenant environments.
    >      
    >      We have recently ported the OpenJDK 8u based scheme past the Unified Logging related changes and have been using it on OpenJDK 11u.
    >      
    >      In various fora, some of us have heard people express an interest in such an implementation.
    >      
    >      Would there be appetite for this in openjdk (there are some differences between our scheme in OpenJDK 8u and their subsequent post-Unified-Logging port to OpenJDK 11)?
    >      
    >      If so, we'd like to open a JBS ticket, attach some patches rebased to current OpenJDK tip, and start preparing a webrev for discussion with a view to possible upstreaming if it converges past review/discussion/debate.
    >      
    >      PS: In recent discussion at the JVMLS, mention was made of something that might be similar in support of JFR streaming, so our rebase to tip and familiarization with any JFR-associated buffered logging might well make this easier or, possibly, moot.
    >      
    >      Thoughts, questions, comments, feedback?
    >      -- ramki
    >      
    >      --
    >      JVM Team, Platform Engineering, Twitter (San Francisco)
    >      
    >      
    >      
    >      --
    >      JVM Team, Infrastructure Engineering, Twitter (San Francisco)
    >      
    >      
    >      
    >      --
    >      JVM Team, Infrastructure Engineering, Twitter (San Francisco)
    >      
    >      
    > 
    



More information about the hotspot-runtime-dev mailing list