Benchmark scenario with high G1 performance degradation
Jens Wilke
jw_list at headissue.com
Fri Apr 21 11:03:45 UTC 2017
Hi Stefan,
On Donnerstag, 20. April 2017 19:59:30 ICT Stefan Johansson wrote:
> Thanks for reaching out and for providing such a good step-by-step guide
> on how to run the benchmark the same way you are.
Thanks for the quick reply!
> I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
> far I can't reproduce your problems. It would be great if you could
> provide us with some more information. For example GC-logs and the
> result files. We might be able to dig something out of them.
The logs from the measurement on my notebook for the first mail (see below) are available at (only 30 days valid):
http://ovh.to/FzKbgrb
What environment you are testing on?
Please mind the core count. My stomach tells me that it could have something to do with the hash table arrays. When you are testing with a system that reports more than 8 cores the allocated arrays will be smaller than in my case, since the cache is doing segmentation.
> From my runs it looks like G1 is about 5-10% behind CMS and 10-15%
> behind Parallel for both JDK 8 and 9.
That seems okay.
Actually, I'd like to publish my next benchmark results, however, I am somehow stuck with this issue now. Benchmarking with CMS only doesn't really make sense at the current point in time. Also I don't like to be in doubt that there is something wrong in the setup.
> I took a quick look at the blog as well, and there the system had 32GB
> ram and the runs were done with -Xmx10g. The system you describe here
> only have 20GB ram and you are using -Xmx20G, is that correct or has
> there been a typo?
My bad, sorry for the confusion. There was enough free memory and RSS was only at 6GB so the system was not swapping. I did play with the parameters to see whether it makes a difference, but forgot to put it in a reasonable range when sending the report.
The effects on the isolated benchmark system with 32GB and-Xmx10g or -Xmx20G are the same (see blog article for parameters).
The hopping point seems to be the function OtherRegionsTable::add_reference.
When I run with -prof perfasm and Java 8U121 with and without G1 on the benchmark system I get this:
.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar \\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ -XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails -f 1 -wi 1 -w 20s -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json -prof perfasm
....[Hottest Methods (after inlining)]..............................................................
22.48% 6.61% C2, level 4 org.cache2k.core.AbstractEviction::removeAllFromReplacementListOnEvict, version 897
21.06% 8.18% C2, level 4 org.cache2k.core.HeapCache::insertNewEntry, version 913
9.38% 7.13% libjvm.so SpinPause
9.13% 9.54% C2, level 4 org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub, version 873
5.00% 3.88% libjvm.so _ZN13InstanceKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
4.54% 4.70% perf-5104.map [unknown]
3.57% 3.86% C2, level 4 org.cache2k.core.AbstractEviction::removeFromHashWithoutListener, version 838
2.86% 15.40% libjvm.so _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
2.48% 12.53% libjvm.so _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
2.46% 1.72% C2, level 4 org.cache2k.core.AbstractEviction::refillChunk, version 906
2.31% 3.33% libjvm.so _ZN18PSPromotionManager22copy_to_survivor_spaceILb0EEEP7oopDescS2_
2.24% 6.37% C2, level 4 java.util.concurrent.locks.StampedLock::acquireRead, version 864
2.03% 2.89% libjvm.so _ZN18PSPromotionManager18drain_stacks_depthEb
1.44% 1.43% libjvm.so _ZN13ObjArrayKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
1.29% 0.72% kernel [unknown]
1.03% 1.27% libjvm.so _ZN18CardTableExtension26scavenge_contents_parallelEP16ObjectStartArrayP12MutableSpaceP8HeapWordP18PSPromotionManagerjj
0.79% 1.53% C2, level 4 java.util.concurrent.locks.StampedLock::acquireWrite, version 865
0.74% 4.21% runtime stub StubRoutines::SafeFetch32
0.71% 0.50% C2, level 4 org.cache2k.core.ClockProPlusEviction::sumUpListHits, version 772
0.70% 0.39% libc-2.19.so __clock_gettime
3.76% 3.73% <...other 147 warm methods...>
....................................................................................................
100.00% 99.93% <totals>
.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar \\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ -XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails\ -XX:+UseG1GC -f 1 -wi 1 -w 20s -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json -prof perfasm
....[Hottest Methods (after inlining)]..............................................................
49.11% 41.16% libjvm.so _ZN17OtherRegionsTable13add_referenceEPvi
10.25% 3.37% C2, level 4 org.cache2k.core.ClockProPlusEviction::removeFromReplacementListOnEvict, version 883
4.93% 1.43% C2, level 4 org.cache2k.core.SegmentedEviction::submitWithoutEviction, version 694
4.31% 5.89% libjvm.so _ZN29G1UpdateRSOrPushRefOopClosure6do_oopEPj
3.18% 4.17% libjvm.so _ZN13ObjArrayKlass20oop_oop_iterate_nv_mEP7oopDescP24FilterOutOfRegionClosure9MemRegion
3.17% 3.00% libjvm.so _ZN29G1BlockOffsetArrayContigSpace18block_start_unsafeEPKv
2.95% 3.16% perf-5226.map [unknown]
2.19% 1.00% C2, level 4 org.cache2k.benchmark.Cache2kFactory$1::getIfPresent, version 892
1.58% 1.50% libjvm.so _ZN8G1RemSet11refine_cardEPajb
1.42% 5.02% libjvm.so _ZNK10HeapRegion12block_is_objEPK8HeapWord
1.41% 3.31% libjvm.so _ZN10HeapRegion32oops_on_card_seq_iterate_carefulE9MemRegionP24FilterOutOfRegionClosurebPa
1.13% 3.05% libjvm.so _ZN13InstanceKlass18oop_oop_iterate_nvEP7oopDescP24FilterOutOfRegionClosure
0.98% 0.51% libjvm.so _ZN14G1HotCardCache6insertEPa
0.89% 4.27% libjvm.so _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
0.85% 1.17% C2, level 4 org.cache2k.core.HeapCache::insertNewEntry, version 899
0.74% 3.59% libjvm.so _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
0.74% 0.57% libjvm.so _ZN20G1ParScanThreadState10trim_queueEv
0.70% 0.70% C2, level 4 org.cache2k.core.Hash2::remove, version 864
0.69% 0.81% C2, level 4 org.cache2k.core.ClockProPlusEviction::findEvictionCandidate, version 906
0.65% 1.59% C2, level 4 org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub, version 857
8.14% 10.65% <...other 331 warm methods...>
....................................................................................................
100.00% 99.91% <totals>
Best,
Jens
--
"Everything superfluous is wrong!"
// Jens Wilke - headissue GmbH - Germany
\// https://headissue.com
More information about the hotspot-gc-use
mailing list