AW: ridiculous ParNew pause times

Wed Apr 9 14:41:36 UTC 2014

Hey,

thanks for the hints :-)

The server runs one JVM and one redis instance (A event-based, single threaded in-memory nosql-datastore).

Redis stores about 5G of data, which are written to disk from time to time ? it now turns out, that the redis saves align perfectly with our long ParNew times.

By initiating a redis save I was able to get these garbage collections.

2014-04-09T15:06:07.892+0200: 85043.119: [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: 2964426K->338263K(3145728K), 0.1532950 secs] 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 sys=0.01, real=10.35 secs]

2014-04-09T15:06:39.203+0200: 85074.429: [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: 3145728K->405516K(3145728K), 0.3429470 secs] 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 sys=0.01, real=32.17 secs]

I monitored the system during the GCs and swapping definitly does not happen.

Because redis is single-threaded (and only used by the JVM which is during ParNew in a STW phase), during ParNew there is only redis active, trying to write 5G of memory to disk as fast as possible and the GC.

I wasn?t able to pinpoint the issue yet, do you have an idea why the jvm could block on I/O during a GC?

Disk access is during these phases probably way slower than usual, as everything is on the / partition, a RAID 1 disk array.

Or does anyone know the right perf options for  the perf tool to monitor blocking i/o during garbage collections?

Regards,

Cornelius Riemenschneider

--

ITscope GmbH

Ludwig-Erhard-Allee 20

76131 Karlsruhe

Email: cornelius.riemenschneider at itscope.de

https://www.itscope.com

Handelsregister: AG Mannheim, HRB 232782

Sitz der Gesellschaft: Karlsruhe

Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger

Von: Vitaly Davidovich [mailto:vitalyd at gmail.com] 
Gesendet: Mittwoch, 9. April 2014 14:49
An: Cornelius Riemenschneider
Cc: hotspot-gc-use
Betreff: Re: ridiculous ParNew pause times

This is typically caused by one of:

1) heavy swapping (this won't count towards sys time because kernel is not using cpu while waiting for I/O to complete)
2) oversubscribed machine where gc threads aren't getting enough cpu time due

Have you looked at stats on the machine when these pauses occur, specifically around swap activity? Is your machine running multiple JVMs or any other noisy neighbor apps?

Sent from my phone

On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" <cri at itscope.de> wrote:

Hello,

we?ve got the following problem with the ParNew collector:

Our log.gc usually looks like this:

2014-04-09T12:58:02.485+0200: 77357.712: [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: 2722925K->100795K(3145728K), 0.3202010 secs] 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 sys=0.02, real=0.32 secs] 

2014-04-09T12:58:06.256+0200: 77361.483: [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: 2722235K->101011K(3145728K), 0.3229910 secs] 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 sys=0.02, real=0.33 secs] 

2014-04-09T12:58:12.295+0200: 77367.522: [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: 2722451K->101057K(3145728K), 0.3215320 secs] 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 sys=0.01, real=0.32 secs] 

2014-04-09T12:58:18.461+0200: 77373.688: [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: 2722497K->2232K(3145728K), 0.2944540 secs] 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 sys=0.00, real=0.29 secs]

But occasionally we have entries like these:

2014-04-09T09:56:12.840+0200: 66448.066: [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: 3139808K->524288K(3145728K), 0.8355990 secs] 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 sys=0.16, real=26.16 secs] 

2014-04-09T09:57:09.173+0200: 66504.400: [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: 2950573K->488555K(3145728K), 0.1876660 secs] 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 sys=0.00, real=15.89 secs] 

2014-04-09T12:58:34.661+0200: 77389.888: [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: 2623439K->2083K(3145728K), 0.0292270 secs] 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 sys=0.02, real=31.76 secs] 

which I can?t explain at all.

The real time of 31.76sec equals a pause of 31.76secs, in which the jvm does not respond to user requests, which is obviously bad.

The application is _very_ allocation heavy, so generally pauses of 0.3sec are okay.

Our GC settings for this server are:

-Xms21g

-Xmx21g

-XX:ReservedCodeCacheSize=256m

-XX:PermSize=256m

-XX:MaxPermSize=768m

-server

-verbose:gc

-Xloggc:log.gc

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-XX:+ExplicitGCInvokesConcurrent

-XX:NewRatio=5

-XX:SurvivorRatio=5

-XX:+UseConcMarkSweepGC

-XX:+UseParNewGC

-XX:+UseCMSInitiatingOccupancyOnly

-XX:CMSInitiatingOccupancyFraction=40

-XX:+CMSClassUnloadingEnabled

-XX:+CMSScavengeBeforeRemark

-Dsun.rmi.dgc.client.gcInterval=1209600000

-Dsun.rmi.dgc.server.gcInterval=1209600000

We run the sun jdk 7u51 on a current debian wheezy.

We previously had issues with long ParNew pauses, but back then, the sys time was high, so we concluded that the server was swapping,

which we were able to prevent.

Do you have any idea or further hint at debugging options which might help us in finding the issue?

Regards,

Cornelius Riemenschneider

--

ITscope GmbH

Ludwig-Erhard-Allee 20

76131 Karlsruhe

Email: cornelius.riemenschneider at itscope.de

https://www.itscope.com

Handelsregister: AG Mannheim, HRB 232782

Sitz der Gesellschaft: Karlsruhe

Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140409/77d5cc05/attachment-0001.html>