Discrepancy in ParNew pause times between 1.6 and 1.7 CMS
Brian Williams
brian.williams at mayalane.com
Wed Nov 28 13:21:36 PST 2012
Thanks Alexey. We re-ran our tests separately with each of your recommendations. Reducing the heap size and running with BlockOffsetArrayUseUnallocatedBlock each dramatically improve performance. It's not quite to where 1.6 is, but it's much closer. Thanks also for the ParGCCardsPerStrideChunk suggestion. We saw a small but noticeable benefit to using it.
You're absolutely right that a 50GB heap is way too big for this test. We were testing with a much smaller dataset than is typical. In the past, we've seen benefits to having more memory available for the JVM to prolong promotion failures. Now, we'll be on the lookout that bigger isn't always better.
On Nov 27, 2012, at 12:58 AM, Alexey Ragozin wrote:
> Hi Brian,
>
> I have found similar regression testing young GC performance.
>
> After investigation I have come to conclusion that culprit is
> -XX:+BlockOffsetArrayUseUnallocatedBlock VM option (requires
> -XX:+UnlockDiagnosticVMOptions).
>
> This option is true is 1.6 and false in 1.7. Commit comment means some
> concurrency issues with that flag.
>
> But you should not blame JVM for discrepancy in your test. Problem is
> withing your test which is using only small fraction of 50GiB.
> BlockOffsetArrayUseUnallocatedBlock allows JVM exclude not-yet-used
> memory from collection. So in 1.6 your benchmark effectively measures
> heapsize of 1-4 GiB.
>
> IMHO you probably adjust your test to use more realistic memory
> pattern (I expect 1.7 be a bit slower, though, at least that was an
> outcome from my testing).
>
> If you interested in minimizing young GC time at a look at
> http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html
>
> Regards,
> Alexey
>
>> We're in the process of doing some performance testing with 1.7, and we're seeing much longer ParNew times in 1.7 than we saw in 1.6. We could use some help isolating the cause.
>>
>> Running an identical performance test of an hour's length, with 1.6u37 we're seeing only 0.009% of pauses over 100ms and just 0.09% over 10ms, but with 1.7u9, we're seeing 64% of pauses over 100ms.
>>
>> Here is typical comparison of a ParNew between the two. It was run with -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime.
>>
>> 1.6u37
>>
>> 2012-11-25T16:51:13.145-0600: 1498.096: [GC Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 2132560492
>> Max Chunk Size: 2132560492
>> Number of Blocks: 1
>> Av. Block Size: 2132560492
>> Tree Height: 1
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 512
>> Max Chunk Size: 512
>> Number of Blocks: 1
>> Av. Block Size: 512
>> Tree Height: 1
>> 1498.096: [ParNew
>> Desired survivor size 107347968 bytes, new threshold 4 (max 4)
>> - age 1: 614064 bytes, 614064 total
>> - age 2: 39344 bytes, 653408 total
>> - age 3: 322432 bytes, 975840 total
>> - age 4: 1208 bytes, 977048 total
>> : 1679253K->1224K(1887488K), 0.0015070 secs] 1792056K->114027K(52219136K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 2132560412
>> Max Chunk Size: 2132560412
>> Number of Blocks: 1
>> Av. Block Size: 2132560412
>> Tree Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 512
>> Max Chunk Size: 512
>> Number of Blocks: 1
>> Av. Block Size: 512
>> Tree Height: 1
>> , 0.0016110 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>> Total time for which application threads were stopped: 0.0022330 seconds
>>
>> 1.7u9:
>>
>> 2012-11-25T19:10:40.869-0600: 1498.458: [GC Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 2136954070
>> Max Chunk Size: 2136954070
>> Number of Blocks: 1
>> Av. Block Size: 2136954070
>> Tree Height: 1
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 0
>> Max Chunk Size: 0
>> Number of Blocks: 0
>> Tree Height: 0
>> 1498.458: [ParNew
>> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
>> - age 1: 1122960 bytes, 1122960 total
>> - age 2: 1560 bytes, 1124520 total
>> - age 3: 2232 bytes, 1126752 total
>> - age 4: 324760 bytes, 1451512 total
>> - age 5: 2232 bytes, 1453744 total
>> - age 6: 728 bytes, 1454472 total
>> : 1680121K->2393K(1887488K), 0.1132030 secs] 1759235K->81509K(52219136K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 2136954070
>> Max Chunk Size: 2136954070
>> Number of Blocks: 1
>> Av. Block Size: 2136954070
>> Tree Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 0
>> Max Chunk Size: 0
>> Number of Blocks: 0
>> Tree Height: 0
>> , 0.1133210 secs] [Times: user=1.45 sys=0.00, real=0.11 secs]
>> Total time for which application threads were stopped: 0.1142450 seconds
>>
>> Notice the 2ms application stop time vs 114ms. We can point you all at full GC logs
>>
>> For what it's worth we repeated this test on 1.7 with our standard set of 1.6 arguments that we've had success with in the past:
>>
>> -d64 -server -Xmx50g -Xms50g -XX:MaxNewSize=2g -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 -XX:ParallelCMSThreads=3 -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:CMSInitiatingOccupancyFraction=83 -XX:+UseParNewGC
>>
>> as well as these minimal CMS args:
>>
>> -d64 -server -Xmx50g -Xms50g -XX:MaxNewSize=2g -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>
>> The results were more or less the same.
>>
>> Are there some additional JVM args that we can enable to shed some light on the big discrepancy in pause times?
>>
>> Thanks.
More information about the hotspot-gc-use
mailing list