Discrepancy in ParNew pause times between 1.6 and 1.7 CMS

Srinivas Ramakrishna ysr1729 at gmail.com
Wed Nov 28 17:21:13 PST 2012


Impressive. Sounds like the stride size should be ergonomically set based
on heap size & #workers, rather than being a static value? If the optimal
varies slowly (i.e. is not sensitive above a specific value, such as your
4K), perhaps that should be the default for now, with suitable ergonomic
downsizing for smaller heap sizes , if needed.

Copying gc-dev, requesting filing a CR for that ergo enhancement, or at
least a change in default value?

PS: BTW, Do you have the CR# for the unallocated block change, just to
refresh collective memory? I would have thought that empty unallocated
blocks because they would have no dirty cards would be quickly taken care
of. A large regression seems a bit surprising.

-- ramki

On Mon, Nov 26, 2012 at 9:58 PM, Alexey Ragozin <alexey.ragozin at gmail.com>wrote:

> Hi Brian,
>
> I have found similar regression testing young GC performance.
>
> After investigation I have come to conclusion that culprit is
> -XX:+BlockOffsetArrayUseUnallocatedBlock VM option (requires
> -XX:+UnlockDiagnosticVMOptions).
>
> This option is true is 1.6 and false in 1.7. Commit comment means some
> concurrency issues with that flag.
>
> But you should not blame JVM for discrepancy in your test. Problem is
> withing your test which is using only small fraction of 50GiB.
> BlockOffsetArrayUseUnallocatedBlock allows JVM exclude not-yet-used
> memory from collection. So in 1.6 your benchmark effectively measures
> heapsize of 1-4 GiB.
>
> IMHO you probably adjust your test to use more realistic memory
> pattern (I expect 1.7 be a bit slower, though, at least that was an
> outcome from my testing).
>
> If you interested in minimizing young GC time at a look at
> http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html
>
> Regards,
> Alexey
>
> > We're in the process of doing some performance testing with 1.7, and
> we're seeing much longer ParNew times in 1.7 than we saw in 1.6.  We could
> use some help isolating the cause.
> >
> > Running an identical performance test of an hour's length, with 1.6u37
> we're seeing only 0.009% of pauses over 100ms and just 0.09% over 10ms, but
> with 1.7u9, we're seeing 64% of pauses over 100ms.
> >
> > Here is typical comparison of a ParNew between the two.  It was run with
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags
> -XX:PrintFLSStatistics=1 -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime.
> >
> > 1.6u37
> >
> > 2012-11-25T16:51:13.145-0600: 1498.096: [GC Before GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 2132560492
> > Max   Chunk Size: 2132560492
> > Number of Blocks: 1
> > Av.  Block  Size: 2132560492
> > Tree      Height: 1
> > Before GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 512
> > Max   Chunk Size: 512
> > Number of Blocks: 1
> > Av.  Block  Size: 512
> > Tree      Height: 1
> > 1498.096: [ParNew
> > Desired survivor size 107347968 bytes, new threshold 4 (max 4)
> > - age   1:     614064 bytes,     614064 total
> > - age   2:      39344 bytes,     653408 total
> > - age   3:     322432 bytes,     975840 total
> > - age   4:       1208 bytes,     977048 total
> > : 1679253K->1224K(1887488K), 0.0015070 secs]
> 1792056K->114027K(52219136K)After GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 2132560412
> > Max   Chunk Size: 2132560412
> > Number of Blocks: 1
> > Av.  Block  Size: 2132560412
> > Tree      Height: 1
> > After GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 512
> > Max   Chunk Size: 512
> > Number of Blocks: 1
> > Av.  Block  Size: 512
> > Tree      Height: 1
> > , 0.0016110 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
> > Total time for which application threads were stopped: 0.0022330 seconds
> >
> > 1.7u9:
> >
> > 2012-11-25T19:10:40.869-0600: 1498.458: [GC Before GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 2136954070
> > Max   Chunk Size: 2136954070
> > Number of Blocks: 1
> > Av.  Block  Size: 2136954070
> > Tree      Height: 1
> > Before GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 0
> > Max   Chunk Size: 0
> > Number of Blocks: 0
> > Tree      Height: 0
> > 1498.458: [ParNew
> > Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> > - age   1:    1122960 bytes,    1122960 total
> > - age   2:       1560 bytes,    1124520 total
> > - age   3:       2232 bytes,    1126752 total
> > - age   4:     324760 bytes,    1451512 total
> > - age   5:       2232 bytes,    1453744 total
> > - age   6:        728 bytes,    1454472 total
> > : 1680121K->2393K(1887488K), 0.1132030 secs]
> 1759235K->81509K(52219136K)After GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 2136954070
> > Max   Chunk Size: 2136954070
> > Number of Blocks: 1
> > Av.  Block  Size: 2136954070
> > Tree      Height: 1
> > After GC:
> > Statistics for BinaryTreeDictionary:
> > ------------------------------------
> > Total Free Space: 0
> > Max   Chunk Size: 0
> > Number of Blocks: 0
> > Tree      Height: 0
> > , 0.1133210 secs] [Times: user=1.45 sys=0.00, real=0.11 secs]
> > Total time for which application threads were stopped: 0.1142450 seconds
> >
> > Notice the 2ms application stop time vs 114ms.  We can point you all at
> full GC logs
> >
> > For what it's worth we repeated this test on 1.7 with our standard set
> of 1.6 arguments that we've had success with in the past:
> >
> > -d64 -server -Xmx50g -Xms50g -XX:MaxNewSize=2g -XX:NewSize=2g
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled
> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1
> -XX:ParallelCMSThreads=3 -XX:CMSMaxAbortablePrecleanTime=3600000
> -XX:CMSInitiatingOccupancyFraction=83 -XX:+UseParNewGC
> >
> > as well as these minimal CMS args:
> >
> > -d64 -server -Xmx50g -Xms50g -XX:MaxNewSize=2g -XX:NewSize=2g
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >
> > The results were more or less the same.
> >
> > Are there some additional JVM args that we can enable to shed some light
> on the big discrepancy in pause times?
> >
> > Thanks.
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121128/87273a2b/attachment.html 


More information about the hotspot-gc-use mailing list