JDK-8229517: Buffered Logging?

Thu Jan 30 22:13:53 UTC 2020

Missing text inserted below:

On Thu, Jan 30, 2020 at 2:12 PM Ramki Ramakrishna <rramakrishna at twitter.com>
wrote:

> As to what buffer size is large enough, you are correct that in the
> absence of assumptions on upper bounds on write rates and disk i/o stall
> durations, there is no a priori bound for buffer size to prevent discards.
> Still, in practice one can come up with reasonable bounds based on expected
> values for these that would minimize write payload lossage due to
> discarding on a full buffer. For example, at Twitter, we have found a
> single (not two as in Xin's implementation) adequate to not lose any data
> on our systems (some handwaving here).
>

s/a single/a single 1 MB buffer to be

> We do not have enough experience w/UL in JDK 11 in production to know for
> sure yet, but my gut feel based on the slower max write rate that UL
> permits because of the mutual exclusion already innate in its
> implementation, makes it even less likely that this will be inadequate w/UL.
>
> later,
> -- ramki
>
> On Thu, Jan 30, 2020 at 2:04 PM Ramki Ramakrishna <
> rramakrishna at twitter.com> wrote:
>
>>
>> Thank you, Thomas!
>>
>> On Wed, Jan 29, 2020 at 3:12 AM Thomas Stüfe <thomas.stuefe at gmail.com>
>> wrote:
>>
>>> - How do you determine which buffer size is large enough? Lets say UL
>>> gets really chatty, how do you deal with buffer overflow? Do you stop the
>>> writers until the buffer is drained or do you discard log output? Both
>>> options seems similar bad.
>>>
>>
>> One discards log output. Blocking isn't good as it defeats the original
>> goal and puts you back where you began. Solutions can be implemented that
>> would neither, but those would be more complicated, and I think unnecessary
>> definitely for GC logging in JDK 8, although might be necessary for UL in
>> JDK 11 (but more on that below).
>>
>>
>>> - We need synchronization now between reader and writers. Slow IO can
>>> stall the reader, which may block the writers (if you stop writers on a
>>> full buffer). I am not sure, but this may be worse than before, where we
>>> had the writer threads effectively write in parallel to the same fd.
>>>
>>
>> Haven't looked at Xin's solution closely, but Twitter's solution doesn't
>> (in JDK 8) restrict effective true parallelism between reader and writers,
>> or between writers. In JDK 11's UL, there is already lack of that
>> parallelism, and our solution which I will share on the JBS ticket doesn't
>> restrict that in any way, but doesn't remove that existing restriction on
>> parallelism either.
>>
>> Patches to be attached to ticket that might provide more concreteness wrt
>> the above.
>>
>> cheers,
>> -- ramki
>>
>>
>>> Thanks & Cheers, Thomas
>>>
>>>
>>> On Sun, Jan 26, 2020 at 12:39 AM Liu, Xin <xxinliu at amazon.com> wrote:
>>>
>>>> Dear hotspot developers,
>>>>
>>>>     I think runtime might be a better forum to continue this
>>>> discussion.
>>>>     The JBS: https://bugs.openjdk.java.net/browse/JDK-8229517
>>>>
>>>>     Now our latest patch is here:
>>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/
>>>>     I know it’s based on jdk8u. I really appreciate if someone reviews
>>>> it and gives me advice.  If the community is still interested, we can keep
>>>> developing using better approach or on the tip of jdk.
>>>>
>>>>     I’d like to present our experiments here. I try to post data on the
>>>> JDK-8229517, but It doesn't look good in comment area.
>>>>
>>>>     Experiments
>>>>     We can’t reproduce the problem in [1] because harddisks are not
>>>> accessible for us. To simulate IO blocking in JVM, we introduce a bogus
>>>> delay. It spins DelayInGCLogging milliseconds for every fwrite.
>>>>       notproduct(intx, DelayInGCLogging, 0,
>>>>       "Delay milliseconds to simulate IO blocking for GC log testing")
>>>>
>>>>     It spins in gcLogFileStream::write_blocking.
>>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/src/share/vm/utilities/ostream.cpp.cdiff.html
>>>>
>>>>     We observe GC pause time using -XX:+PrintGCApplicationStoppedTime
>>>> for JavaWorkload with 2g heap. The unit is second.  The program is on
>>>> github. https://github.com/zhenyun/JavaGCworkload.
>>>>     $java -Xmx2g -Xms2g -Xloggc:gc.log -XX:+UseG1GC
>>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
>>>> -XX:+UnlockExperimentalVMOptions Test
>>>>     sync means jdk8u-dev. async means jdk8u-dev with nonblocking gclog
>>>> change. delayXXX denotes an artifactual delay given by
>>>> -XX:DelayInGCLogging=XXX.
>>>>
>>>>                                  median     p95         p99
>>>>  p100        losses
>>>>     -------------------------------------------------------------------
>>>>     async                   0.01357   0.36187  0.48691   0.61792
>>>>     sync                     0.01339   0.3618    0.48646   0.63539
>>>>     -------------------------------------------------------------------
>>>>     delay5ms-async 0.01516  0.40564  0.486        0.61053  148156
>>>>     delay5ms-sync   0.20444  0.4428    0.56733    0.69522
>>>>     -------------------------------------------------------------------
>>>>     delay10ms-async0.0123   0.47451  0.48623    0.61083  195460
>>>>     delay10ms-sync  0.38732 0.52424  0.64783    0.78087
>>>>     -------------------------------------------------------------------
>>>>
>>>>     async and sync perform equally in happy case. JavaWorkload didn’t
>>>> lose any GC log even though the GC activities are intensive and the outputs
>>>> are verbose. If we introduce 5 ms delay for each file-writing,
>>>> delay5ms-sync starts to show longer stop time for p99 and p100. eg. p99
>>>> increases from 0.486s to 0.567s, which is 16,7% longer. delay5ms-async
>>>> remains the same p99 and p100 at expense of losing 148k logging events.
>>>> Because the outputs GC logs are contiguous, they are still readable and
>>>> helpful for problem analysis. After we increase delay to 10ms,
>>>> delay10ms-async still have the same p99 and p100 stop time while
>>>> delay10ms-sync has 22.9% longer stop time.
>>>>
>>>>     [1]Eliminating Large JVM GC Pauses Caused by Background IO Traffic
>>>>
>>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>>>
>>>>     Thanks,
>>>>     --lx
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     From: Ramki Ramakrishna <rramakrishna at twitter.com>
>>>>     Date: Tuesday, August 13, 2019 at 4:16 PM
>>>>     To: "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
>>>>     Cc: "Liu, Xin" <xxinliu at amazon.com>, "Mathiske, Bernd" <
>>>> mathiske at amazon.com>, "Hohensee, Paul" <hohensee at amazon.com>, John
>>>> Coomes <jcoomes at twitter.com>, Tony Printezis <tprintezis at twitter.com>
>>>>     Subject: Re: Buffered Logging?
>>>>
>>>>
>>>>     I filed https://bugs.openjdk.java.net/browse/JDK-8229517.  Please
>>>> correct/complete if I mangled or neglected to fill any specific required
>>>> fields.
>>>>
>>>>     Patches forthcoming in the next few days.
>>>>     -- ramki
>>>>
>>>>     On Tue, Aug 13, 2019 at 2:53 PM Ramki Ramakrishna <mailto:
>>>> rramakrishna at twitter.com> wrote:
>>>>
>>>>     I'll go ahead and open a JBS ticket to keep the ball rolling on
>>>> this one, and so we have a place to hang some patches and discussions off
>>>> of. Scream if I shouldn't.
>>>>
>>>>     -- ramki
>>>>
>>>>     On Mon, Aug 12, 2019 at 6:19 PM Ramki Ramakrishna <mailto:
>>>> rramakrishna at twitter.com> wrote:
>>>>
>>>>     (Resent... Apologies to anyone who might see duplicates. There was
>>>> an issue with my membership in the list which, I am guessing, has now been
>>>> resolved.)
>>>>
>>>>     ---------- Forwarded message ----------
>>>>     From: Ramki Ramakrishna <mailto:rramakrishna at twitter.com>
>>>>     To: mailto:hotspot-dev at openjdk.java.net
>>>>     Cc: "Liu, Xin" <mailto:xxinliu at amazon.com>, "Mathiske, Bernd"
>>>> <mailto:mathiske at amazon.com>, "Hohensee, Paul" <mailto:
>>>> hohensee at amazon.com>, John Coomes <mailto:jcoomes at twitter.com>, Tony
>>>> Printezis <mailto:tprintezis at twitter.com>
>>>>     Bcc:
>>>>     Date: Mon, 12 Aug 2019 15:48:17 -0700
>>>>     Subject: Buffered Logging?
>>>>
>>>>     In the JDK 7 time-frame, some folks noticed much longer than
>>>> expected pauses that seemed to coincide with GC logging in the midst of a
>>>> GC safepoint. In that setup, the GC logs were going to a disk file (these
>>>> were often useful for post-mortem analyses) rather than to a RAM-based
>>>> tmpfs which had been the original design center assumption. The
>>>> vicissitudes of the dirty page flushing policy in Linux when
>>>>     IO load on the machine (not necessarily the JVM process doing the
>>>> logging)
>>>>     could affect the length and duration of these inline logging stalls.
>>>>
>>>>     A buffered logging scheme was then implemented by us (and
>>>> independently by
>>>>     others) which we have used successfully to date to avoid these
>>>> pauses in high i/o
>>>>     multi-tenant environments.
>>>>
>>>>     We have recently ported the OpenJDK 8u based scheme past the
>>>> Unified Logging related changes and have been using it on OpenJDK 11u.
>>>>
>>>>     In various fora, some of us have heard people express an interest
>>>> in such an implementation.
>>>>
>>>>     Would there be appetite for this in openjdk (there are some
>>>> differences between our scheme in OpenJDK 8u and their subsequent
>>>> post-Unified-Logging port to OpenJDK 11)?
>>>>
>>>>     If so, we'd like to open a JBS ticket, attach some patches rebased
>>>> to current OpenJDK tip, and start preparing a webrev for discussion with a
>>>> view to possible upstreaming if it converges past review/discussion/debate.
>>>>
>>>>     PS: In recent discussion at the JVMLS, mention was made of
>>>> something that might be similar in support of JFR streaming, so our rebase
>>>> to tip and familiarization with any JFR-associated buffered logging might
>>>> well make this easier or, possibly, moot.
>>>>
>>>>     Thoughts, questions, comments, feedback?
>>>>     -- ramki
>>>>
>>>>     --
>>>>     JVM Team, Platform Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>>     --
>>>>     JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>>     --
>>>>     JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>
>
>
> --
> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>

-- 
JVM Team, Infrastructure Engineering, Twitter (San Francisco)