Strange Long ParNew GC Pauses (Sample Code Included)

James Lent jlent at tivo.com
Thu Apr 30 17:50:15 UTC 2015


I will give it a read.

In the mean time I tested with Oracle Java 6 and it seems OK with the BlockOffsetArrayUseUnallocatedBlock either enabled or disabled.

To me that indicates something happened in Java 7 and filtered back to Open JDK 6 (my understanding is that the OpenJDK 6 genealogy is a bit complicated).

I don't think the times in the second example below can be justified by "legitimately" dirty cards.  Approximately the same number of new objects are allocated between each ParNew cycle and yet the ParNewGC times in that example grow steadily after the first few cycles.  It start at about 0.03 seconds and then starts increasing by about 5 seconds each cycle.  When I finally kill the process it is taking 58.93 seconds (keeping 4 CPUs busy for 232.31 seconds).  The fact that is comes back at all is the most amazing part to me.

Further, the (one and only) ArrayList should not get promoted to Old memory until many cycles have gone by and it is the only object in Tenured memory that should have any references back to the objects I am creating in New memory (i.e.,. there should not be that many dirty cards created by the application code each cycle until that ArrayList gets very big and perhaps moved).

In the first case below the system seems to recover after just a few expensive cycles.  Very different behavior.

Finally the sensitivity of the problem to the size of the static object (the 32M "bad" range) makes me very suspicious.  The static object starts its life Tenured memory and should have no references  to any objects in New memory (it contains no references at all).

Of course the "card table" concept is new to me so I could be missing something.

You could try http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html which talks about an option that is safe but still diagnostic, though it doesn’t seem like your heap is that big

Assuming the large object (in this case your arraylist(s) since they are going to be growing) is an array of object pointers, then the sensitivity to size is likely due to whether it gets promoted to or allocated immediately in the old gen, but mostly populated with new gen objects. If so the next young gen GC is likely to be slow, since most cards will be dirty.

On Apr 30, 2015, at 11:21 AM, James Lent <jlent at tivo.com<mailto:jlent at tivo.com>> wrote:

Thanks the quick response and taking the time to reproduce the issue.  A
card table problem (corruption?) makes sense to me (although I had never
heard the term "card table" until a couple of days ago when I was doing
some random GC research, so what do I know).

Let me see if I understand the Bugs you referenced:

JDK-8079128 - Seems to reference a 100% increase in ParNewGC times.
That seems pretty tame compared to this issue.  In this case the
behavior looks insane.  A 100% increase would be about 0.02 seconds.

JDK-7040068 - Seems to report that the
BlockOffsetArrayUseUnallocatedBlock is dangerous.

I did check and it is enabled in my Oracle Java 6 JVM by default, but,
is not even displayed as an option in my OpenJDK 6-8 and Oracle 7-8 JVMs:

/usr/lib/jvm/java-6-oracle/bin/java -XX:+PrintFlagsFinal -version  |
grep BlockOffsetArrayUseUnallocatedBlock
    bool BlockOffsetArrayUseUnallocatedBlock       = true
{product}
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

/usr/lib/jvm/java-8-oracle/bin/java -XX:+PrintFlagsFinal -version  |
grep BlockOffsetArrayUseUnallocatedBlock
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

I tried to enable it in Oracle 8 and got the following message:

/usr/lib/jvm/java-8-oracle/bin/java -verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G
-Xms6G -XX:+BlockOffsetArrayUseUnallocatedBlock LongParNewPause
$((500*1024*1024)) 100 100
Error: VM option 'BlockOffsetArrayUseUnallocatedBlock' is diagnostic and
must be enabled via -XX:+UnlockDiagnosticVMOptions.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

That is consistent with your statement that it is not recommended - our
operations folks are unlikely to go for that.

This bug makes me think that not enabling it also has serious issues.

Hi James,

I did a quick perf inspection and saw a lot of time spent scanning the
card table, so on a hunch I ran your sample program with
-XX:+BlockOffsetArrayUseUnallocatedBlock
and that appears to make the extreme pause times go away.

See https://bugs.openjdk.java.net/browse/JDK-8079128
and https://bugs.openjdk.java.net/browse/JDK-7040068

I don't have time to dig further into this just at this moment, and
running with that flag can cause random crashes and is not recommended.

/Mikael

On 2015-04-30 17:01, James Lent wrote:
Here is a more frightening example:

Bad Case - 500M Static Object:

java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause
$((500*1024*1024)) 100 100
0.309: [GC0.309: [ParNew: 272640K->3028K(306688K), 0.0287780 secs]
784640K->515028K(6257408K), 0.0288700 secs] [Times: user=0.08 sys=0.01,
real=0.03 secs]
0.372: [GC0.372: [ParNew: 275668K->7062K(306688K), 0.0228070 secs]
787668K->519062K(6257408K), 0.0228580 secs] [Times: user=0.07 sys=0.00,
real=0.03 secs]
0.430: [GC0.430: [ParNew: 279702K->11314K(306688K), 0.0327930 secs]
791702K->523314K(6257408K), 0.0328510 secs] [Times: user=0.08 sys=0.01,
real=0.03 secs]
0.497: [GC0.497: [ParNew: 283954K->15383K(306688K), 0.0336020 secs]
795954K->527383K(6257408K), 0.0336550 secs] [Times: user=0.08 sys=0.00,
real=0.03 secs]
0.565: [GC0.565: [ParNew: 288023K->21006K(306688K), 0.0282110 secs]
800023K->533006K(6257408K), 0.0282740 secs] [Times: user=0.08 sys=0.00,
real=0.03 secs]
0.627: [GC0.627: [ParNew: 293646K->26805K(306688K), 0.0265270 secs]
805646K->538805K(6257408K), 0.0266220 secs] [Times: user=0.07 sys=0.01,
real=0.03 secs]
0.688: [GC0.688: [ParNew: 299445K->20215K(306688K), 1.3657150 secs]
811445K->535105K(6257408K), 1.3657830 secs] [Times: user=3.97 sys=0.01,
real=1.36 secs]
2.087: [GC2.087: [ParNew: 292855K->17914K(306688K), 6.6188870 secs]
807745K->535501K(6257408K), 6.6189490 secs] [Times: user=19.71 sys=0.03,
real=6.61 secs]
8.741: [GC8.741: [ParNew: 290554K->17433K(306688K), 14.2495190 secs]
808141K->537744K(6257408K), 14.2495830 secs] [Times: user=42.34
sys=0.10, real=14.25 secs]
23.025: [GC23.025: [ParNew: 290073K->17315K(306688K), 21.1579920 secs]
810384K->540348K(6257408K), 21.1580510 secs] [Times: user=70.10
sys=0.08, real=21.16 secs]
44.216: [GC44.216: [ParNew: 289955K->17758K(306688K), 27.6932380 secs]
812988K->543511K(6257408K), 27.6933060 secs] [Times: user=103.91
sys=0.16, real=27.69 secs]
71.941: [GC71.941: [ParNew: 290398K->17745K(306688K), 35.1077720 secs]
816151K->546225K(6257408K), 35.1078600 secs] [Times: user=130.86
sys=0.10, real=35.11 secs]
107.081: [GC107.081: [ParNew: 290385K->21826K(306688K), 41.4425020 secs]
818865K->553022K(6257408K), 41.4425720 secs] [Times: user=158.25
sys=0.31, real=41.44 secs]
148.555: [GC148.555: [ParNew: 294466K->21834K(306688K), 45.9826660 secs]
825662K->555757K(6257408K), 45.9827260 secs] [Times: user=180.91
sys=0.14, real=45.98 secs]
194.570: [GC194.570: [ParNew: 294474K->21836K(306688K), 51.5779770 secs]
828397K->558485K(6257408K), 51.5780450 secs] [Times: user=204.05
sys=0.20, real=51.58 secs]
246.180: [GC246.180: [ParNew^C: 294476K->18454K(306688K), 58.9307800
secs] 831125K->557829K(6257408K), 58.9308660 secs] [Times: user=232.31
sys=0.23, real=58.93 secs]
Heap
 par new generation   total 306688K, used 40308K [0x000000067ae00000,
0x000000068fac0000, 0x000000068fac0000)
  eden space 272640K,   8% used [0x000000067ae00000,
0x000000067c357980, 0x000000068b840000)
  from space 34048K,  54% used [0x000000068b840000, 0x000000068ca458f8,
0x000000068d980000)
  to   space 34048K,   0% used [0x000000068d980000, 0x000000068d980000,
0x000000068fac0000)
 concurrent mark-sweep generation total 5950720K, used 539375K
[0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 2435K
[0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)

Good Case - 479M Static Object:

java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause
$((479*1024*1024)) 100 100
0.298: [GC0.298: [ParNew: 272640K->3036K(306688K), 0.0152390 secs]
763136K->493532K(6257408K), 0.0153450 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.346: [GC0.346: [ParNew: 275676K->7769K(306688K), 0.0193840 secs]
766172K->498265K(6257408K), 0.0194570 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
0.398: [GC0.398: [ParNew: 280409K->11314K(306688K), 0.0203460 secs]
770905K->501810K(6257408K), 0.0204080 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.450: [GC0.450: [ParNew: 283954K->17306K(306688K), 0.0222390 secs]
774450K->507802K(6257408K), 0.0223070 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.504: [GC0.504: [ParNew: 289946K->18380K(306688K), 0.0169000 secs]
780442K->508876K(6257408K), 0.0169630 secs] [Times: user=0.07 sys=0.01,
real=0.02 secs]
0.552: [GC0.552: [ParNew: 291020K->26805K(306688K), 0.0203990 secs]
781516K->517301K(6257408K), 0.0204620 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.604: [GC0.604: [ParNew: 299445K->21153K(306688K), 0.0230980 secs]
789941K->514539K(6257408K), 0.0231610 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.659: [GC0.659: [ParNew: 293793K->29415K(306688K), 0.0170240 secs]
787179K->525498K(6257408K), 0.0170970 secs] [Times: user=0.07 sys=0.01,
real=0.02 secs]
0.708: [GC0.708: [ParNew: 302055K->23874K(306688K), 0.0202970 secs]
798138K->522681K(6257408K), 0.0203600 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.759: [GC0.760: [ParNew: 296514K->26842K(306688K), 0.0238600 secs]
795321K->528371K(6257408K), 0.0239390 secs] [Times: user=0.07 sys=0.00,
real=0.03 secs]
0.815: [GC0.815: [ParNew: 299482K->25343K(306688K), 0.0237580 secs]
801011K->529592K(6257408K), 0.0238030 secs] [Times: user=0.06 sys=0.01,
real=0.02 secs]
0.870: [GC0.870: [ParNew: 297983K->25767K(306688K), 0.0195800 secs]
802232K->532743K(6257408K), 0.0196290 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
0.921: [GC0.921: [ParNew: 298407K->21795K(306688K), 0.0196310 secs]
805383K->531488K(6257408K), 0.0196960 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
0.972: [GC0.972: [ParNew: 294435K->25910K(306688K), 0.0242780 secs]
804128K->538329K(6257408K), 0.0243440 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
1.028: [GC1.028: [ParNew: 298550K->21834K(306688K), 0.0235000 secs]
810969K->536979K(6257408K), 0.0235600 secs] [Times: user=0.06 sys=0.00,
real=0.03 secs]
1.083: [GC1.083: [ParNew: 294474K->26625K(306688K), 0.0188330 secs]
809619K->544497K(6257408K), 0.0188950 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
1.133: [GC1.133: [ParNew: 299265K->26602K(306688K), 0.0210780 secs]
817137K->547186K(6257408K), 0.0211380 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]
1.185: [GC1.185: [ParNew: 299242K->26612K(306688K), 0.0236720 secs]
819826K->549922K(6257408K), 0.0237230 secs] [Times: user=0.07 sys=0.00,
real=0.03 secs]
1.240: [GC1.241: [ParNew: 299252K->26615K(306688K), 0.0188560 secs]
822562K->552651K(6257408K), 0.0189150 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.291: [GC1.291: [ParNew: 299255K->26615K(306688K), 0.0195090 secs]
825291K->555378K(6257408K), 0.0195870 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.342: [GC1.342: [ParNew: 299255K->22531K(306688K), 0.0229010 secs]
828018K->554021K(6257408K), 0.0229610 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.396: [GC1.396: [ParNew: 295171K->24505K(306688K), 0.0265920 secs]
826661K->560810K(6257408K), 0.0266360 secs] [Times: user=0.07 sys=0.00,
real=0.03 secs]
1.453: [GC1.453: [ParNew: 297145K->24529K(306688K), 0.0296490 secs]
833450K->563560K(6257408K), 0.0297070 secs] [Times: user=0.09 sys=0.00,
real=0.03 secs]
1.514: [GC1.514: [ParNew: 297169K->27700K(306688K), 0.0259820 secs]
836200K->569458K(6257408K), 0.0260310 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.571: [GC1.572: [ParNew: 300340K->27666K(306688K), 0.0199210 secs]
842098K->572150K(6257408K), 0.0199650 secs] [Times: user=0.07 sys=0.01,
real=0.02 secs]
1.623: [GC1.623: [ParNew: 300306K->27658K(306688K), 0.0237020 secs]
844790K->574868K(6257408K), 0.0237630 secs] [Times: user=0.08 sys=0.00,
real=0.02 secs]
1.678: [GC1.678: [ParNew: 300298K->31737K(306688K), 0.0237820 secs]
847508K->581674K(6257408K), 0.0238530 secs] [Times: user=0.08 sys=0.00,
real=0.03 secs]
1.733: [GC1.733: [ParNew: 304377K->21022K(306688K), 0.0265400 secs]
854314K->573685K(6257408K), 0.0265980 secs] [Times: user=0.08 sys=0.00,
real=0.02 secs]
1.791: [GC1.791: [ParNew: 293662K->25359K(306688K), 0.0249520 secs]
846325K->580748K(6257408K), 0.0250050 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.847: [GC1.847: [ParNew: 297999K->19930K(306688K), 0.0195120 secs]
853388K->581179K(6257408K), 0.0195650 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
1.898: [GC1.898: [ParNew: 292570K->20318K(306688K), 0.0233960 secs]
853819K->584294K(6257408K), 0.0234650 secs] [Times: user=0.07 sys=0.00,
real=0.03 secs]
1.953: [GC1.953: [ParNew: 292958K->20415K(306688K), 0.0233530 secs]
856934K->587117K(6257408K), 0.0234130 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
2.007: [GC2.007: [ParNew: 293055K->20439K(306688K), 0.0301410 secs]
859757K->589868K(6257408K), 0.0302070 secs] [Times: user=0.09 sys=0.00,
real=0.03 secs]
2.068: [GC2.068: [ParNew: 293079K->20445K(306688K), 0.0289190 secs]
862508K->592600K(6257408K), 0.0289690 secs] [Times: user=0.09 sys=0.00,
real=0.03 secs]
^C2.129: [GC2.129: [ParNew: 293085K->29284K(306688K), 0.0218880 secs]
865240K->604166K(6257408K), 0.0219350 secs] [Times: user=0.09 sys=0.00,
real=0.02 secs]
Heap
 par new generation   total 306688K, used 40135K [0x000000067ae00000,
0x000000068fac0000, 0x000000068fac0000)
  eden space 272640K,   3% used [0x000000067ae00000,
0x000000067b898a78, 0x000000068b840000)
  from space 34048K,  86% used [0x000000068d980000, 0x000000068f619320,
0x000000068fac0000)
  to   space 34048K,   0% used [0x000000068b840000, 0x000000068b840000,
0x000000068d980000)
 concurrent mark-sweep generation total 5950720K, used 574881K
[0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 2435K
[0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)


Recently we started seeing long ParNewGC pauses when starting up Kafka
that were causing session timeouts:

[2015-04-24 13:26:23,244] INFO 0 successfully elected as leader
(kafka.server.ZookeeperLeaderElector)
2.111: [GC (Allocation Failure) 2.111: [ParNew:
136320K->10236K(153344K), 0.0235777 secs] 648320K->522236K(2080128K),
0.0237092 secs] [Times: user=0.03 sys=0.01, real=0.02 secs]
2.599: [GC (Allocation Failure) 2.599: [ParNew:
146556K->3201K(153344K),
9.1514626 secs] 658556K->519191K(2080128K), 9.1515757 secs] [Times:
user=18.25 sys=0.01, real=9.15 secs]
[2015-04-24 13:26:33,443] INFO New leader is 0
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

After much investigation I found that the trigger was the allocation of
a 500M static object early in the Kafka startup code.  It of course
makes no sense (to me) that the allocation of a single large static
object in Old memory would impact ParNew collections, but, testing here
indicates that it does.  I have created an Oracle bug report, but,
it is
still under investigation.

I have reproduced the problem with a simple application on several
Linux
platforms including an EC2 instance (and last night a Win 7 laptop) and
the following JREs:

OpenJDK: 6, 7, and 8
Oracle: 7 and 8

Oracle 6 does not seem to have an issue.  All its ParNewGC times are
small.

Here is the simple program that demonstrates the issue (at least
everywhere I have tried it):

import java.util.ArrayList;

public class LongParNewPause {

   static byte[] bigStaticObject;

   public static void main(String[] args) throws Exception {
       int bigObjSize   = args.length > 0 ? Integer.parseInt(args[0])
: 500 * 1024 * 1024;
       int littleObjSize = args.length > 1 ? Integer.parseInt(args[1])
: 100;
       int saveFraction  = args.length > 2 ? Integer.parseInt(args[2])
: 10;

       bigStaticObject = new byte[bigObjSize];

       ArrayList<byte[]> holder = new ArrayList<byte[]>();

       int i = 0;
       while (true) {
           byte[] local = new byte[littleObjSize];
           if (i++ % saveFraction == 0) {
               holder.add(local);
           }
       }
   }
}

I run it with the following options:

   -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx2G -Xms2G

Note that I have not seen the issue with 1G heaps.  4G heaps also
exhibit the issue (as do heaps as small as 1.2G).  Bigger heaps seem to
aggrevate the problem.

Here is some sample output:

0.321: [GC (Allocation Failure) 0.321: [ParNew:
272640K->27329K(306688K), 0.0140537 secs] 784640K->539329K(2063104K),
0.0141584 secs] [Times: user=0.05 sys=0.02, real=0.02 secs]
0.368: [GC (Allocation Failure) 0.368: [ParNew:
299969K->34048K(306688K), 0.7655383 secs] 811969K->572321K(2063104K),
0.7656172 secs] [Times: user=2.89 sys=0.02, real=0.77 secs]
1.165: [GC (Allocation Failure) 1.165: [ParNew:
306688K->34048K(306688K), 13.8395969 secs] 844961K->599389K(2063104K),
13.8396650 secs] [Times: user=54.38 sys=0.05, real=13.84 secs]
15.036: [GC (Allocation Failure) 15.036: [ParNew:
306688K->34048K(306688K), 0.0287254 secs] 872029K->628028K(2063104K),
0.0287876 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
15.096: [GC (Allocation Failure) 15.096: [ParNew:
306688K->34048K(306688K), 0.0340727 secs] 900668K->657717K(2063104K),
0.0341386 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
...

Even stranger is the fact that the problem seems to be limited to
objects in the range of about 480M to 512M.  Specifically:

[ 503316465 , 536870384 ]

Values outside this range appear to be OK.  Playing with the other
input
parameters (size of the small objects and the fraction that are
retained) can modify the system behavior:

- Increase the length of the pauses - I have seen pauses of over a
minute my testing.
- Cause the pause times to steadily increase over many ParNewGC cycles.

During the pauses the CPUs seem to peg 100% (no waiting on resources).

Anyone have any thoughts?  I would be very interested to know if anyone
else can reproduce the issue?  Other engineers here have been able to.


________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150430/afc6268a/attachment.htm>


More information about the hotspot-gc-dev mailing list