AW: ridiculous ParNew pause times
Vitaly Davidovich
vitalyd at gmail.com
Wed Apr 9 18:15:43 UTC 2014
Ouch, that's very unfortunate. I'm curious if there's a good reason why it
has to be synchronous? Seems like the log entries can be flushed async by a
dedicated (or some other VM worker) thread and not risk stalling the app
like this.
Sent from my phone
On Apr 9, 2014 1:05 PM, "Gustav Åkesson" <gustav.r.akesson at gmail.com> wrote:
> Hi,
>
> Yes the GC-logging is synchronous and part of the GC cycle, so any delay
> will add to the pause.
>
> Best regards,
> Gustav Åkesson
> Den 9 apr 2014 18:40 skrev "Cornelius Riemenschneider" <cri at itscope.de>:
>
>> Bingo!
>> First, i tried to get a tool which shows me which process writes to which
>> file and how long that takes, but I was unable to find one I could master
>> to use.
>> Based on your suggestion I moved the log.gc file to a ramdisk and
>> performed extensive load testing - now my biggest outlier is
>> 2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200:
>> 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs]
>> 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61
>> sys=0.03, real=1.83 secs]
>> which is okay.
>> When moving log.gc back to the harddisk to which redis saves I get pauses
>> as long as 45sec.
>>
>> So I agree with Vitaly that redis probably thrashes some caches (though
>> not all, as it only occupies one cpu core of a dual-socket server) which
>> would explain a slowdown of garbage collections, but the real cause of the
>> long spikes is actually writing the log to disk.
>>
>> Could you guys check if you write the gc log synchronous?
>> If yes, I'd suggest (a) moving writing the log out of the STW phase and
>> (b) using asynchronous I/O for writing the log.
>>
>> Thanks for all your suggestions and your help!
>>
>> Regards,
>> Cornelius Riemenschneider
>> --
>> ITscope GmbH
>> Ludwig-Erhard-Allee 20
>> 76131 Karlsruhe
>> Email: cornelius.riemenschneider at itscope.de
>> https://www.itscope.com
>> Handelsregister: AG Mannheim, HRB 232782
>> Sitz der Gesellschaft: Karlsruhe
>> Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Wolfgang Pedot [mailto:wolfgang.pedot at finkzeit.at]
>> Gesendet: Mittwoch, 9. April 2014 17:29
>> An: Vitaly Davidovich
>> Cc: Cornelius Riemenschneider; hotspot-gc-use
>> Betreff: Re: ridiculous ParNew pause times
>>
>> A crazy idea: could the gc-logging be the problem here because the
>> redis-dump is using up all the IO bandwidth?
>>
>> Wolfgang
>>
>> Am 09.04.2014 17:06, schrieb Vitaly Davidovich:
>> > Interesting. How much RAM does this machine have? You're using about
>> > 21-22g for jvm and another 5g for redis, right? Is there room to spare
>> > in physical memory?
>> >
>> > If there's no swap activity, then I don't think there's any I/O
>> > interference. I'm guessing that redis writing out 5g of memory is
>> > blowing the cpu caches for the jvm, and causing page faults. Although
>> > if there's no swap activity, it seems these faults would be soft and
>> > it's hard to explain how that would balloon to 30 sec pauses and not
>> > show up in sys time then.
>> >
>> > Are there lots of context switches in the java process when this
>> happens?
>> >
>> > Sent from my phone
>> >
>> > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" <cri at itscope.de
>> > <mailto:cri at itscope.de>> wrote:
>> >
>> > Hey,____
>> >
>> > thanks for the hints :-)____
>> >
>> > The server runs one JVM and one redis instance (A event-based,
>> > single threaded in-memory nosql-datastore).____
>> >
>> > Redis stores about 5G of data, which are written to disk from time
>> > to time ? it now turns out, that the redis saves align perfectly
>> > with our long ParNew times.____
>> >
>> > By initiating a redis save I was able to get these garbage
>> > collections.____
>> >
>> > 2014-04-09T15:06:07.892+0200: 85043.119:
>> > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew:
>> > 2964426K->338263K(3145728K), 0.1532950 secs]
>> > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93
>> > sys=0.01, real=10.35 secs]____
>> >
>> > 2014-04-09T15:06:39.203+0200: 85074.429:
>> > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew:
>> > 3145728K->405516K(3145728K), 0.3429470 secs]
>> > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96
>> > sys=0.01, real=32.17 secs]____
>> >
>> > I monitored the system during the GCs and swapping definitly does
>> > not happen.____
>> >
>> > Because redis is single-threaded (and only used by the JVM which is
>> > during ParNew in a STW phase), during ParNew there is only redis
>> > active, trying to write 5G of memory to disk as fast as possible and
>> > the GC.____
>> >
>> > I wasn?t able to pinpoint the issue yet, do you have an idea why the
>> > jvm could block on I/O during a GC?____
>> >
>> > Disk access is during these phases probably way slower than usual,
>> > as everything is on the / partition, a RAID 1 disk array.____
>> >
>> > Or does anyone know the right perf options for the perf tool to
>> > monitor blocking i/o during garbage collections?____
>> >
>> > __ __
>> >
>> > Regards,____
>> >
>> > Cornelius Riemenschneider____
>> >
>> > --____
>> >
>> > ITscope GmbH____
>> >
>> > Ludwig-Erhard-Allee 20____
>> >
>> > 76131 Karlsruhe____
>> >
>> > Email: cornelius.riemenschneider at itscope.de
>> > <mailto:cornelius.riemenschneider at itscope.de>____
>> >
>> > https://www.itscope.com____
>> >
>> > Handelsregister: AG Mannheim, HRB 232782____
>> >
>> > Sitz der Gesellschaft: Karlsruhe____
>> >
>> > Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____
>> >
>> > __ __
>> >
>> > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com
>> > <mailto:vitalyd at gmail.com>]
>> > *Gesendet:* Mittwoch, 9. April 2014 14:49
>> > *An:* Cornelius Riemenschneider
>> > *Cc:* hotspot-gc-use
>> > *Betreff:* Re: ridiculous ParNew pause times____
>> >
>> > __ __
>> >
>> > This is typically caused by one of:____
>> >
>> > 1) heavy swapping (this won't count towards sys time because kernel
>> > is not using cpu while waiting for I/O to complete)
>> > 2) oversubscribed machine where gc threads aren't getting enough cpu
>> > time due____
>> >
>> > Have you looked at stats on the machine when these pauses occur,
>> > specifically around swap activity? Is your machine running multiple
>> > JVMs or any other noisy neighbor apps?____
>> >
>> > Sent from my phone____
>> >
>> > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" <cri at itscope.de
>> > <mailto:cri at itscope.de>> wrote:____
>> >
>> > Hello,____
>> >
>> > we?ve got the following problem with the ParNew collector:____
>> >
>> > Our log.gc usually looks like this:____
>> >
>> > 2014-04-09T12:58:02.485+0200: 77357.712:
>> > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew:
>> > 2722925K->100795K(3145728K), 0.3202010 secs]
>> > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10
>> > sys=0.02, real=0.32 secs] ____
>> >
>> > 2014-04-09T12:58:06.256+0200: 77361.483:
>> > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew:
>> > 2722235K->101011K(3145728K), 0.3229910 secs]
>> > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13
>> > sys=0.02, real=0.33 secs] ____
>> >
>> > 2014-04-09T12:58:12.295+0200: 77367.522:
>> > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew:
>> > 2722451K->101057K(3145728K), 0.3215320 secs]
>> > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12
>> > sys=0.01, real=0.32 secs] ____
>> >
>> > 2014-04-09T12:58:18.461+0200: 77373.688:
>> > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew:
>> > 2722497K->2232K(3145728K), 0.2944540 secs]
>> > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79
>> > sys=0.00, real=0.29 secs]____
>> >
>> > But occasionally we have entries like these:____
>> >
>> > 2014-04-09T09:56:12.840+0200: 66448.066:
>> > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew:
>> > 3139808K->524288K(3145728K), 0.8355990 secs]
>> > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59
>> > sys=0.16, real=26.16 secs] ____
>> >
>> > 2014-04-09T09:57:09.173+0200: 66504.400:
>> > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew:
>> > 2950573K->488555K(3145728K), 0.1876660 secs]
>> > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38
>> > sys=0.00, real=15.89 secs] ____
>> >
>> > 2014-04-09T12:58:34.661+0200: 77389.888:
>> > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew:
>> > 2623439K->2083K(3145728K), 0.0292270 secs]
>> > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34
>> > sys=0.02, real=31.76 secs] ____
>> >
>> > which I can?t explain at all.____
>> >
>> > The real time of 31.76sec equals a pause of 31.76secs, in which the
>> > jvm does not respond to user requests, which is obviously bad.____
>> >
>> > The application is _/very/_ allocation heavy, so generally pauses of
>> > 0.3sec are okay.____
>> >
>> > Our GC settings for this server are:____
>> >
>> > -Xms21g____
>> >
>> > -Xmx21g____
>> >
>> > -XX:ReservedCodeCacheSize=256m____
>> >
>> > -XX:PermSize=256m____
>> >
>> > -XX:MaxPermSize=768m____
>> >
>> > -server____
>> >
>> > -verbose:gc____
>> >
>> > -Xloggc:log.gc____
>> >
>> > -XX:+PrintGCDetails____
>> >
>> > -XX:+PrintGCDateStamps____
>> >
>> > -XX:+ExplicitGCInvokesConcurrent____
>> >
>> > -XX:NewRatio=5____
>> >
>> > -XX:SurvivorRatio=5____
>> >
>> > -XX:+UseConcMarkSweepGC____
>> >
>> > -XX:+UseParNewGC____
>> >
>> > -XX:+UseCMSInitiatingOccupancyOnly____
>> >
>> > -XX:CMSInitiatingOccupancyFraction=40____
>> >
>> > -XX:+CMSClassUnloadingEnabled____
>> >
>> > -XX:+CMSScavengeBeforeRemark____
>> >
>> > -Dsun.rmi.dgc.client.gcInterval=1209600000____
>> >
>> > -Dsun.rmi.dgc.server.gcInterval=1209600000____
>> >
>> > ____
>> >
>> > We run the sun jdk 7u51 on a current debian wheezy.____
>> >
>> > We previously had issues with long ParNew pauses, but back then, the
>> > sys time was high, so we concluded that the server was
>> > swapping,____
>> >
>> > which we were able to prevent.____
>> >
>> > Do you have any idea or further hint at debugging options which
>> > might help us in finding the issue?____
>> >
>> > ____
>> >
>> > Regards,____
>> >
>> > Cornelius Riemenschneider____
>> >
>> > --____
>> >
>> > ITscope GmbH____
>> >
>> > Ludwig-Erhard-Allee 20____
>> >
>> > 76131 Karlsruhe____
>> >
>> > Email: cornelius.riemenschneider at itscope.de
>> > <mailto:cornelius.riemenschneider at itscope.de>____
>> >
>> > https://www.itscope.com____
>> >
>> > Handelsregister: AG Mannheim, HRB 232782____
>> >
>> > Sitz der Gesellschaft: Karlsruhe____
>> >
>> > Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____
>> >
>> > ____
>> >
>> >
>> > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net <mailto:
>> hotspot-gc-use at openjdk.java.net>
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____
>> >
>> >
>> >
>> > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140409/e5ae1e5c/attachment.html>
More information about the hotspot-gc-use
mailing list