JDK-8229517: Buffered Logging?
Ramki Ramakrishna
rramakrishna at twitter.com
Thu Jan 30 22:13:53 UTC 2020
Missing text inserted below:
On Thu, Jan 30, 2020 at 2:12 PM Ramki Ramakrishna <rramakrishna at twitter.com>
wrote:
> As to what buffer size is large enough, you are correct that in the
> absence of assumptions on upper bounds on write rates and disk i/o stall
> durations, there is no a priori bound for buffer size to prevent discards.
> Still, in practice one can come up with reasonable bounds based on expected
> values for these that would minimize write payload lossage due to
> discarding on a full buffer. For example, at Twitter, we have found a
> single (not two as in Xin's implementation) adequate to not lose any data
> on our systems (some handwaving here).
>
s/a single/a single 1 MB buffer to be
> We do not have enough experience w/UL in JDK 11 in production to know for
> sure yet, but my gut feel based on the slower max write rate that UL
> permits because of the mutual exclusion already innate in its
> implementation, makes it even less likely that this will be inadequate w/UL.
>
> later,
> -- ramki
>
> On Thu, Jan 30, 2020 at 2:04 PM Ramki Ramakrishna <
> rramakrishna at twitter.com> wrote:
>
>>
>> Thank you, Thomas!
>>
>> On Wed, Jan 29, 2020 at 3:12 AM Thomas Stüfe <thomas.stuefe at gmail.com>
>> wrote:
>>
>>> - How do you determine which buffer size is large enough? Lets say UL
>>> gets really chatty, how do you deal with buffer overflow? Do you stop the
>>> writers until the buffer is drained or do you discard log output? Both
>>> options seems similar bad.
>>>
>>
>> One discards log output. Blocking isn't good as it defeats the original
>> goal and puts you back where you began. Solutions can be implemented that
>> would neither, but those would be more complicated, and I think unnecessary
>> definitely for GC logging in JDK 8, although might be necessary for UL in
>> JDK 11 (but more on that below).
>>
>>
>>> - We need synchronization now between reader and writers. Slow IO can
>>> stall the reader, which may block the writers (if you stop writers on a
>>> full buffer). I am not sure, but this may be worse than before, where we
>>> had the writer threads effectively write in parallel to the same fd.
>>>
>>
>> Haven't looked at Xin's solution closely, but Twitter's solution doesn't
>> (in JDK 8) restrict effective true parallelism between reader and writers,
>> or between writers. In JDK 11's UL, there is already lack of that
>> parallelism, and our solution which I will share on the JBS ticket doesn't
>> restrict that in any way, but doesn't remove that existing restriction on
>> parallelism either.
>>
>> Patches to be attached to ticket that might provide more concreteness wrt
>> the above.
>>
>> cheers,
>> -- ramki
>>
>>
>>> Thanks & Cheers, Thomas
>>>
>>>
>>> On Sun, Jan 26, 2020 at 12:39 AM Liu, Xin <xxinliu at amazon.com> wrote:
>>>
>>>> Dear hotspot developers,
>>>>
>>>> I think runtime might be a better forum to continue this
>>>> discussion.
>>>> The JBS: https://bugs.openjdk.java.net/browse/JDK-8229517
>>>>
>>>> Now our latest patch is here:
>>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/
>>>> I know it’s based on jdk8u. I really appreciate if someone reviews
>>>> it and gives me advice. If the community is still interested, we can keep
>>>> developing using better approach or on the tip of jdk.
>>>>
>>>> I’d like to present our experiments here. I try to post data on the
>>>> JDK-8229517, but It doesn't look good in comment area.
>>>>
>>>> Experiments
>>>> We can’t reproduce the problem in [1] because harddisks are not
>>>> accessible for us. To simulate IO blocking in JVM, we introduce a bogus
>>>> delay. It spins DelayInGCLogging milliseconds for every fwrite.
>>>> notproduct(intx, DelayInGCLogging, 0,
>>>> "Delay milliseconds to simulate IO blocking for GC log testing")
>>>>
>>>> It spins in gcLogFileStream::write_blocking.
>>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/src/share/vm/utilities/ostream.cpp.cdiff.html
>>>>
>>>> We observe GC pause time using -XX:+PrintGCApplicationStoppedTime
>>>> for JavaWorkload with 2g heap. The unit is second. The program is on
>>>> github. https://github.com/zhenyun/JavaGCworkload.
>>>> $java -Xmx2g -Xms2g -Xloggc:gc.log -XX:+UseG1GC
>>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
>>>> -XX:+UnlockExperimentalVMOptions Test
>>>> sync means jdk8u-dev. async means jdk8u-dev with nonblocking gclog
>>>> change. delayXXX denotes an artifactual delay given by
>>>> -XX:DelayInGCLogging=XXX.
>>>>
>>>> median p95 p99
>>>> p100 losses
>>>> -------------------------------------------------------------------
>>>> async 0.01357 0.36187 0.48691 0.61792
>>>> sync 0.01339 0.3618 0.48646 0.63539
>>>> -------------------------------------------------------------------
>>>> delay5ms-async 0.01516 0.40564 0.486 0.61053 148156
>>>> delay5ms-sync 0.20444 0.4428 0.56733 0.69522
>>>> -------------------------------------------------------------------
>>>> delay10ms-async0.0123 0.47451 0.48623 0.61083 195460
>>>> delay10ms-sync 0.38732 0.52424 0.64783 0.78087
>>>> -------------------------------------------------------------------
>>>>
>>>> async and sync perform equally in happy case. JavaWorkload didn’t
>>>> lose any GC log even though the GC activities are intensive and the outputs
>>>> are verbose. If we introduce 5 ms delay for each file-writing,
>>>> delay5ms-sync starts to show longer stop time for p99 and p100. eg. p99
>>>> increases from 0.486s to 0.567s, which is 16,7% longer. delay5ms-async
>>>> remains the same p99 and p100 at expense of losing 148k logging events.
>>>> Because the outputs GC logs are contiguous, they are still readable and
>>>> helpful for problem analysis. After we increase delay to 10ms,
>>>> delay10ms-async still have the same p99 and p100 stop time while
>>>> delay10ms-sync has 22.9% longer stop time.
>>>>
>>>> [1]Eliminating Large JVM GC Pauses Caused by Background IO Traffic
>>>>
>>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>>>
>>>> Thanks,
>>>> --lx
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Ramki Ramakrishna <rramakrishna at twitter.com>
>>>> Date: Tuesday, August 13, 2019 at 4:16 PM
>>>> To: "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
>>>> Cc: "Liu, Xin" <xxinliu at amazon.com>, "Mathiske, Bernd" <
>>>> mathiske at amazon.com>, "Hohensee, Paul" <hohensee at amazon.com>, John
>>>> Coomes <jcoomes at twitter.com>, Tony Printezis <tprintezis at twitter.com>
>>>> Subject: Re: Buffered Logging?
>>>>
>>>>
>>>> I filed https://bugs.openjdk.java.net/browse/JDK-8229517. Please
>>>> correct/complete if I mangled or neglected to fill any specific required
>>>> fields.
>>>>
>>>> Patches forthcoming in the next few days.
>>>> -- ramki
>>>>
>>>> On Tue, Aug 13, 2019 at 2:53 PM Ramki Ramakrishna <mailto:
>>>> rramakrishna at twitter.com> wrote:
>>>>
>>>> I'll go ahead and open a JBS ticket to keep the ball rolling on
>>>> this one, and so we have a place to hang some patches and discussions off
>>>> of. Scream if I shouldn't.
>>>>
>>>> -- ramki
>>>>
>>>> On Mon, Aug 12, 2019 at 6:19 PM Ramki Ramakrishna <mailto:
>>>> rramakrishna at twitter.com> wrote:
>>>>
>>>> (Resent... Apologies to anyone who might see duplicates. There was
>>>> an issue with my membership in the list which, I am guessing, has now been
>>>> resolved.)
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Ramki Ramakrishna <mailto:rramakrishna at twitter.com>
>>>> To: mailto:hotspot-dev at openjdk.java.net
>>>> Cc: "Liu, Xin" <mailto:xxinliu at amazon.com>, "Mathiske, Bernd"
>>>> <mailto:mathiske at amazon.com>, "Hohensee, Paul" <mailto:
>>>> hohensee at amazon.com>, John Coomes <mailto:jcoomes at twitter.com>, Tony
>>>> Printezis <mailto:tprintezis at twitter.com>
>>>> Bcc:
>>>> Date: Mon, 12 Aug 2019 15:48:17 -0700
>>>> Subject: Buffered Logging?
>>>>
>>>> In the JDK 7 time-frame, some folks noticed much longer than
>>>> expected pauses that seemed to coincide with GC logging in the midst of a
>>>> GC safepoint. In that setup, the GC logs were going to a disk file (these
>>>> were often useful for post-mortem analyses) rather than to a RAM-based
>>>> tmpfs which had been the original design center assumption. The
>>>> vicissitudes of the dirty page flushing policy in Linux when
>>>> IO load on the machine (not necessarily the JVM process doing the
>>>> logging)
>>>> could affect the length and duration of these inline logging stalls.
>>>>
>>>> A buffered logging scheme was then implemented by us (and
>>>> independently by
>>>> others) which we have used successfully to date to avoid these
>>>> pauses in high i/o
>>>> multi-tenant environments.
>>>>
>>>> We have recently ported the OpenJDK 8u based scheme past the
>>>> Unified Logging related changes and have been using it on OpenJDK 11u.
>>>>
>>>> In various fora, some of us have heard people express an interest
>>>> in such an implementation.
>>>>
>>>> Would there be appetite for this in openjdk (there are some
>>>> differences between our scheme in OpenJDK 8u and their subsequent
>>>> post-Unified-Logging port to OpenJDK 11)?
>>>>
>>>> If so, we'd like to open a JBS ticket, attach some patches rebased
>>>> to current OpenJDK tip, and start preparing a webrev for discussion with a
>>>> view to possible upstreaming if it converges past review/discussion/debate.
>>>>
>>>> PS: In recent discussion at the JVMLS, mention was made of
>>>> something that might be similar in support of JFR streaming, so our rebase
>>>> to tip and familiarization with any JFR-associated buffered logging might
>>>> well make this easier or, possibly, moot.
>>>>
>>>> Thoughts, questions, comments, feedback?
>>>> -- ramki
>>>>
>>>> --
>>>> JVM Team, Platform Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>> --
>>>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>> --
>>>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>
>
>
> --
> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>
--
JVM Team, Infrastructure Engineering, Twitter (San Francisco)
More information about the hotspot-runtime-dev
mailing list