JDK-8229517: Buffered Logging?
Ramki Ramakrishna
rramakrishna at twitter.com
Thu Jan 30 22:12:02 UTC 2020
As to what buffer size is large enough, you are correct that in the absence
of assumptions on upper bounds on write rates and disk i/o stall durations,
there is no a priori bound for buffer size to prevent discards. Still, in
practice one can come up with reasonable bounds based on expected values
for these that would minimize write payload lossage due to discarding on a
full buffer. For example, at Twitter, we have found a single (not two as in
Xin's implementation) adequate to not lose any data on our systems (some
handwaving here). We do not have enough experience w/UL in JDK 11 in
production to know for sure yet, but my gut feel based on the slower max
write rate that UL permits because of the mutual exclusion already innate
in its implementation, makes it even less likely that this will be
inadequate w/UL.
later,
-- ramki
On Thu, Jan 30, 2020 at 2:04 PM Ramki Ramakrishna <rramakrishna at twitter.com>
wrote:
>
> Thank you, Thomas!
>
> On Wed, Jan 29, 2020 at 3:12 AM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
>> - How do you determine which buffer size is large enough? Lets say UL
>> gets really chatty, how do you deal with buffer overflow? Do you stop the
>> writers until the buffer is drained or do you discard log output? Both
>> options seems similar bad.
>>
>
> One discards log output. Blocking isn't good as it defeats the original
> goal and puts you back where you began. Solutions can be implemented that
> would neither, but those would be more complicated, and I think unnecessary
> definitely for GC logging in JDK 8, although might be necessary for UL in
> JDK 11 (but more on that below).
>
>
>> - We need synchronization now between reader and writers. Slow IO can
>> stall the reader, which may block the writers (if you stop writers on a
>> full buffer). I am not sure, but this may be worse than before, where we
>> had the writer threads effectively write in parallel to the same fd.
>>
>
> Haven't looked at Xin's solution closely, but Twitter's solution doesn't
> (in JDK 8) restrict effective true parallelism between reader and writers,
> or between writers. In JDK 11's UL, there is already lack of that
> parallelism, and our solution which I will share on the JBS ticket doesn't
> restrict that in any way, but doesn't remove that existing restriction on
> parallelism either.
>
> Patches to be attached to ticket that might provide more concreteness wrt
> the above.
>
> cheers,
> -- ramki
>
>
>> Thanks & Cheers, Thomas
>>
>>
>> On Sun, Jan 26, 2020 at 12:39 AM Liu, Xin <xxinliu at amazon.com> wrote:
>>
>>> Dear hotspot developers,
>>>
>>> I think runtime might be a better forum to continue this
>>> discussion.
>>> The JBS: https://bugs.openjdk.java.net/browse/JDK-8229517
>>>
>>> Now our latest patch is here:
>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/
>>> I know it’s based on jdk8u. I really appreciate if someone reviews
>>> it and gives me advice. If the community is still interested, we can keep
>>> developing using better approach or on the tip of jdk.
>>>
>>> I’d like to present our experiments here. I try to post data on the
>>> JDK-8229517, but It doesn't look good in comment area.
>>>
>>> Experiments
>>> We can’t reproduce the problem in [1] because harddisks are not
>>> accessible for us. To simulate IO blocking in JVM, we introduce a bogus
>>> delay. It spins DelayInGCLogging milliseconds for every fwrite.
>>> notproduct(intx, DelayInGCLogging, 0,
>>> "Delay milliseconds to simulate IO blocking for GC log testing")
>>>
>>> It spins in gcLogFileStream::write_blocking.
>>> https://cr.openjdk.java.net/~xliu/8229517/webrev02/webrev/src/share/vm/utilities/ostream.cpp.cdiff.html
>>>
>>> We observe GC pause time using -XX:+PrintGCApplicationStoppedTime
>>> for JavaWorkload with 2g heap. The unit is second. The program is on
>>> github. https://github.com/zhenyun/JavaGCworkload.
>>> $java -Xmx2g -Xms2g -Xloggc:gc.log -XX:+UseG1GC
>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
>>> -XX:+UnlockExperimentalVMOptions Test
>>> sync means jdk8u-dev. async means jdk8u-dev with nonblocking gclog
>>> change. delayXXX denotes an artifactual delay given by
>>> -XX:DelayInGCLogging=XXX.
>>>
>>> median p95 p99
>>> p100 losses
>>> -------------------------------------------------------------------
>>> async 0.01357 0.36187 0.48691 0.61792
>>> sync 0.01339 0.3618 0.48646 0.63539
>>> -------------------------------------------------------------------
>>> delay5ms-async 0.01516 0.40564 0.486 0.61053 148156
>>> delay5ms-sync 0.20444 0.4428 0.56733 0.69522
>>> -------------------------------------------------------------------
>>> delay10ms-async0.0123 0.47451 0.48623 0.61083 195460
>>> delay10ms-sync 0.38732 0.52424 0.64783 0.78087
>>> -------------------------------------------------------------------
>>>
>>> async and sync perform equally in happy case. JavaWorkload didn’t
>>> lose any GC log even though the GC activities are intensive and the outputs
>>> are verbose. If we introduce 5 ms delay for each file-writing,
>>> delay5ms-sync starts to show longer stop time for p99 and p100. eg. p99
>>> increases from 0.486s to 0.567s, which is 16,7% longer. delay5ms-async
>>> remains the same p99 and p100 at expense of losing 148k logging events.
>>> Because the outputs GC logs are contiguous, they are still readable and
>>> helpful for problem analysis. After we increase delay to 10ms,
>>> delay10ms-async still have the same p99 and p100 stop time while
>>> delay10ms-sync has 22.9% longer stop time.
>>>
>>> [1]Eliminating Large JVM GC Pauses Caused by Background IO Traffic
>>>
>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>>
>>> Thanks,
>>> --lx
>>>
>>>
>>>
>>>
>>>
>>> From: Ramki Ramakrishna <rramakrishna at twitter.com>
>>> Date: Tuesday, August 13, 2019 at 4:16 PM
>>> To: "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
>>> Cc: "Liu, Xin" <xxinliu at amazon.com>, "Mathiske, Bernd" <
>>> mathiske at amazon.com>, "Hohensee, Paul" <hohensee at amazon.com>, John
>>> Coomes <jcoomes at twitter.com>, Tony Printezis <tprintezis at twitter.com>
>>> Subject: Re: Buffered Logging?
>>>
>>>
>>> I filed https://bugs.openjdk.java.net/browse/JDK-8229517. Please
>>> correct/complete if I mangled or neglected to fill any specific required
>>> fields.
>>>
>>> Patches forthcoming in the next few days.
>>> -- ramki
>>>
>>> On Tue, Aug 13, 2019 at 2:53 PM Ramki Ramakrishna <mailto:
>>> rramakrishna at twitter.com> wrote:
>>>
>>> I'll go ahead and open a JBS ticket to keep the ball rolling on this
>>> one, and so we have a place to hang some patches and discussions off of.
>>> Scream if I shouldn't.
>>>
>>> -- ramki
>>>
>>> On Mon, Aug 12, 2019 at 6:19 PM Ramki Ramakrishna <mailto:
>>> rramakrishna at twitter.com> wrote:
>>>
>>> (Resent... Apologies to anyone who might see duplicates. There was
>>> an issue with my membership in the list which, I am guessing, has now been
>>> resolved.)
>>>
>>> ---------- Forwarded message ----------
>>> From: Ramki Ramakrishna <mailto:rramakrishna at twitter.com>
>>> To: mailto:hotspot-dev at openjdk.java.net
>>> Cc: "Liu, Xin" <mailto:xxinliu at amazon.com>, "Mathiske, Bernd"
>>> <mailto:mathiske at amazon.com>, "Hohensee, Paul" <mailto:
>>> hohensee at amazon.com>, John Coomes <mailto:jcoomes at twitter.com>, Tony
>>> Printezis <mailto:tprintezis at twitter.com>
>>> Bcc:
>>> Date: Mon, 12 Aug 2019 15:48:17 -0700
>>> Subject: Buffered Logging?
>>>
>>> In the JDK 7 time-frame, some folks noticed much longer than
>>> expected pauses that seemed to coincide with GC logging in the midst of a
>>> GC safepoint. In that setup, the GC logs were going to a disk file (these
>>> were often useful for post-mortem analyses) rather than to a RAM-based
>>> tmpfs which had been the original design center assumption. The
>>> vicissitudes of the dirty page flushing policy in Linux when
>>> IO load on the machine (not necessarily the JVM process doing the
>>> logging)
>>> could affect the length and duration of these inline logging stalls.
>>>
>>> A buffered logging scheme was then implemented by us (and
>>> independently by
>>> others) which we have used successfully to date to avoid these
>>> pauses in high i/o
>>> multi-tenant environments.
>>>
>>> We have recently ported the OpenJDK 8u based scheme past the Unified
>>> Logging related changes and have been using it on OpenJDK 11u.
>>>
>>> In various fora, some of us have heard people express an interest in
>>> such an implementation.
>>>
>>> Would there be appetite for this in openjdk (there are some
>>> differences between our scheme in OpenJDK 8u and their subsequent
>>> post-Unified-Logging port to OpenJDK 11)?
>>>
>>> If so, we'd like to open a JBS ticket, attach some patches rebased
>>> to current OpenJDK tip, and start preparing a webrev for discussion with a
>>> view to possible upstreaming if it converges past review/discussion/debate.
>>>
>>> PS: In recent discussion at the JVMLS, mention was made of something
>>> that might be similar in support of JFR streaming, so our rebase to tip and
>>> familiarization with any JFR-associated buffered logging might well make
>>> this easier or, possibly, moot.
>>>
>>> Thoughts, questions, comments, feedback?
>>> -- ramki
>>>
>>> --
>>> JVM Team, Platform Engineering, Twitter (San Francisco)
>>>
>>>
>>>
>>> --
>>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>
>>>
>>>
>>> --
>>> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>>>
>>>
>>>
>>>
>
> --
> JVM Team, Infrastructure Engineering, Twitter (San Francisco)
>
--
JVM Team, Infrastructure Engineering, Twitter (San Francisco)
More information about the hotspot-runtime-dev
mailing list