ridiculous ParNew pause times
Gustav Åkesson
gustav.r.akesson at gmail.com
Wed Apr 9 15:48:16 UTC 2014
Hi,
I think that is an idea worth investigating. I've had many cases where GC
logging is causing long YGCs when disc is heavily loaded. Try to log to a
file on a RAM drive, if you have the possibility.
Besy Regards,
Gustav Åkesson
Den 9 apr 2014 17:40 skrev "Vitaly Davidovich" <vitalyd at gmail.com>:
> That's an interesting theory. Does the jvm fsync log writes? If it
> doesn't then I'd expect the write syscalls to not actually block. Perhaps
> there's some contention in the kernel though?
>
> Cornelius, maybe you can strace the jvm and see what syscalls are made in
> this scenario?
>
> Sent from my phone
> On Apr 9, 2014 11:29 AM, "Wolfgang Pedot" <wolfgang.pedot at finkzeit.at>
> wrote:
>
>> A crazy idea: could the gc-logging be the problem here because the
>> redis-dump is using up all the IO bandwidth?
>>
>> Wolfgang
>>
>> Am 09.04.2014 17:06, schrieb Vitaly Davidovich:
>>
>>> Interesting. How much RAM does this machine have? You're using about
>>> 21-22g for jvm and another 5g for redis, right? Is there room to spare
>>> in physical memory?
>>>
>>> If there's no swap activity, then I don't think there's any I/O
>>> interference. I'm guessing that redis writing out 5g of memory is
>>> blowing the cpu caches for the jvm, and causing page faults. Although
>>> if there's no swap activity, it seems these faults would be soft and
>>> it's hard to explain how that would balloon to 30 sec pauses and not
>>> show up in sys time then.
>>>
>>> Are there lots of context switches in the java process when this happens?
>>>
>>> Sent from my phone
>>>
>>> On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" <cri at itscope.de
>>> <mailto:cri at itscope.de>> wrote:
>>>
>>> Hey,____
>>>
>>> thanks for the hints :-)____
>>>
>>> The server runs one JVM and one redis instance (A event-based,
>>> single threaded in-memory nosql-datastore).____
>>>
>>> Redis stores about 5G of data, which are written to disk from time
>>> to time - it now turns out, that the redis saves align perfectly
>>> with our long ParNew times.____
>>>
>>> By initiating a redis save I was able to get these garbage
>>> collections.____
>>>
>>> 2014-04-09T15:06:07.892+0200: 85043.119:
>>> [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew:
>>> 2964426K->338263K(3145728K), 0.1532950 secs]
>>> 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93
>>> sys=0.01, real=10.35 secs]____
>>>
>>> 2014-04-09T15:06:39.203+0200: 85074.429:
>>> [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew:
>>> 3145728K->405516K(3145728K), 0.3429470 secs]
>>> 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96
>>> sys=0.01, real=32.17 secs]____
>>>
>>> I monitored the system during the GCs and swapping definitly does
>>> not happen.____
>>>
>>> Because redis is single-threaded (and only used by the JVM which is
>>> during ParNew in a STW phase), during ParNew there is only redis
>>> active, trying to write 5G of memory to disk as fast as possible and
>>> the GC.____
>>>
>>> I wasn't able to pinpoint the issue yet, do you have an idea why the
>>> jvm could block on I/O during a GC?____
>>>
>>> Disk access is during these phases probably way slower than usual,
>>> as everything is on the / partition, a RAID 1 disk array.____
>>>
>>> Or does anyone know the right perf options for the perf tool to
>>> monitor blocking i/o during garbage collections?____
>>>
>>> __ __
>>>
>>> Regards,____
>>>
>>> Cornelius Riemenschneider____
>>>
>>> --____
>>>
>>> ITscope GmbH____
>>>
>>> Ludwig-Erhard-Allee 20____
>>>
>>> 76131 Karlsruhe____
>>>
>>> Email: cornelius.riemenschneider at itscope.de
>>> <mailto:cornelius.riemenschneider at itscope.de>____
>>>
>>> https://www.itscope.com____
>>>
>>> Handelsregister: AG Mannheim, HRB 232782____
>>>
>>> Sitz der Gesellschaft: Karlsruhe____
>>>
>>> Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____
>>>
>>> __ __
>>>
>>> *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com
>>> <mailto:vitalyd at gmail.com>]
>>> *Gesendet:* Mittwoch, 9. April 2014 14:49
>>> *An:* Cornelius Riemenschneider
>>> *Cc:* hotspot-gc-use
>>> *Betreff:* Re: ridiculous ParNew pause times____
>>>
>>> __ __
>>>
>>> This is typically caused by one of:____
>>>
>>> 1) heavy swapping (this won't count towards sys time because kernel
>>> is not using cpu while waiting for I/O to complete)
>>> 2) oversubscribed machine where gc threads aren't getting enough cpu
>>> time due____
>>>
>>> Have you looked at stats on the machine when these pauses occur,
>>> specifically around swap activity? Is your machine running multiple
>>> JVMs or any other noisy neighbor apps?____
>>>
>>> Sent from my phone____
>>>
>>> On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" <cri at itscope.de
>>> <mailto:cri at itscope.de>> wrote:____
>>>
>>> Hello,____
>>>
>>> we've got the following problem with the ParNew collector:____
>>>
>>> Our log.gc usually looks like this:____
>>>
>>> 2014-04-09T12:58:02.485+0200: 77357.712:
>>> [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew:
>>> 2722925K->100795K(3145728K), 0.3202010 secs]
>>> 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10
>>> sys=0.02, real=0.32 secs] ____
>>>
>>> 2014-04-09T12:58:06.256+0200: 77361.483:
>>> [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew:
>>> 2722235K->101011K(3145728K), 0.3229910 secs]
>>> 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13
>>> sys=0.02, real=0.33 secs] ____
>>>
>>> 2014-04-09T12:58:12.295+0200: 77367.522:
>>> [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew:
>>> 2722451K->101057K(3145728K), 0.3215320 secs]
>>> 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12
>>> sys=0.01, real=0.32 secs] ____
>>>
>>> 2014-04-09T12:58:18.461+0200: 77373.688:
>>> [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew:
>>> 2722497K->2232K(3145728K), 0.2944540 secs]
>>> 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79
>>> sys=0.00, real=0.29 secs]____
>>>
>>> But occasionally we have entries like these:____
>>>
>>> 2014-04-09T09:56:12.840+0200: 66448.066:
>>> [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew:
>>> 3139808K->524288K(3145728K), 0.8355990 secs]
>>> 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59
>>> sys=0.16, real=26.16 secs] ____
>>>
>>> 2014-04-09T09:57:09.173+0200: 66504.400:
>>> [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew:
>>> 2950573K->488555K(3145728K), 0.1876660 secs]
>>> 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38
>>> sys=0.00, real=15.89 secs] ____
>>>
>>> 2014-04-09T12:58:34.661+0200: 77389.888:
>>> [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew:
>>> 2623439K->2083K(3145728K), 0.0292270 secs]
>>> 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34
>>> sys=0.02, real=31.76 secs] ____
>>>
>>> which I can't explain at all.____
>>>
>>> The real time of 31.76sec equals a pause of 31.76secs, in which the
>>> jvm does not respond to user requests, which is obviously bad.____
>>>
>>> The application is _/very/_ allocation heavy, so generally pauses of
>>> 0.3sec are okay.____
>>>
>>> Our GC settings for this server are:____
>>>
>>> -Xms21g____
>>>
>>> -Xmx21g____
>>>
>>> -XX:ReservedCodeCacheSize=256m____
>>>
>>> -XX:PermSize=256m____
>>>
>>> -XX:MaxPermSize=768m____
>>>
>>> -server____
>>>
>>> -verbose:gc____
>>>
>>> -Xloggc:log.gc____
>>>
>>> -XX:+PrintGCDetails____
>>>
>>> -XX:+PrintGCDateStamps____
>>>
>>> -XX:+ExplicitGCInvokesConcurrent____
>>>
>>> -XX:NewRatio=5____
>>>
>>> -XX:SurvivorRatio=5____
>>>
>>> -XX:+UseConcMarkSweepGC____
>>>
>>> -XX:+UseParNewGC____
>>>
>>> -XX:+UseCMSInitiatingOccupancyOnly____
>>>
>>> -XX:CMSInitiatingOccupancyFraction=40____
>>>
>>> -XX:+CMSClassUnloadingEnabled____
>>>
>>> -XX:+CMSScavengeBeforeRemark____
>>>
>>> -Dsun.rmi.dgc.client.gcInterval=1209600000____
>>>
>>> -Dsun.rmi.dgc.server.gcInterval=1209600000____
>>>
>>> ____
>>>
>>> We run the sun jdk 7u51 on a current debian wheezy.____
>>>
>>> We previously had issues with long ParNew pauses, but back then, the
>>> sys time was high, so we concluded that the server was swapping,____
>>>
>>> which we were able to prevent.____
>>>
>>> Do you have any idea or further hint at debugging options which
>>> might help us in finding the issue?____
>>>
>>> ____
>>>
>>> Regards,____
>>>
>>> Cornelius Riemenschneider____
>>>
>>> --____
>>>
>>> ITscope GmbH____
>>>
>>> Ludwig-Erhard-Allee 20____
>>>
>>> 76131 Karlsruhe____
>>>
>>> Email: cornelius.riemenschneider at itscope.de
>>> <mailto:cornelius.riemenschneider at itscope.de>____
>>>
>>> https://www.itscope.com____
>>>
>>> Handelsregister: AG Mannheim, HRB 232782____
>>>
>>> Sitz der Gesellschaft: Karlsruhe____
>>>
>>> Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger____
>>>
>>> ____
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use@
>>> openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140409/0b3e96f8/attachment-0001.html>
More information about the hotspot-gc-use
mailing list