Benchmark scenario with high G1 performance degradation
Stefan Johansson
stefan.johansson at oracle.com
Mon Apr 24 14:16:20 UTC 2017
Hi Jens,
On 2017-04-21 13:03, Jens Wilke wrote:
> Hi Stefan,
>
> On Donnerstag, 20. April 2017 19:59:30 ICT Stefan Johansson wrote:
>> Thanks for reaching out and for providing such a good step-by-step guide
>> on how to run the benchmark the same way you are.
> Thanks for the quick reply!
>
>> I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
>> far I can't reproduce your problems. It would be great if you could
>> provide us with some more information. For example GC-logs and the
>> result files. We might be able to dig something out of them.
> The logs from the measurement on my notebook for the first mail (see below) are available at (only 30 days valid):
>
> http://ovh.to/FzKbgrb
>
> What environment you are testing on?
I only did some quick testing on my desktop with has 12 cores and
hyper-threading, so the default is to use 18 parallel GC threads on my
system.
>
> Please mind the core count. My stomach tells me that it could have something to do with the hash table arrays. When you are testing with a system that reports more than 8 cores the allocated arrays will be smaller than in my case, since the cache is doing segmentation.
>
>> From my runs it looks like G1 is about 5-10% behind CMS and 10-15%
>> behind Parallel for both JDK 8 and 9.
> That seems okay.
>
> Actually, I'd like to publish my next benchmark results, however, I am somehow stuck with this issue now. Benchmarking with CMS only doesn't really make sense at the current point in time. Also I don't like to be in doubt that there is something wrong in the setup.
>
>> I took a quick look at the blog as well, and there the system had 32GB
>> ram and the runs were done with -Xmx10g. The system you describe here
>> only have 20GB ram and you are using -Xmx20G, is that correct or has
>> there been a typo?
> My bad, sorry for the confusion. There was enough free memory and RSS was only at 6GB so the system was not swapping. I did play with the parameters to see whether it makes a difference, but forgot to put it in a reasonable range when sending the report.
>
> The effects on the isolated benchmark system with 32GB and-Xmx10g or -Xmx20G are the same (see blog article for parameters).
>
> The hopping point seems to be the function OtherRegionsTable::add_reference.
> When I run with -prof perfasm and Java 8U121 with and without G1 on the benchmark system I get this:
>
> .../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar \\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ -XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails -f 1 -wi 1 -w 20s -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json -prof perfasm
>
> ....[Hottest Methods (after inlining)]..............................................................
> 22.48% 6.61% C2, level 4 org.cache2k.core.AbstractEviction::removeAllFromReplacementListOnEvict, version 897
> 21.06% 8.18% C2, level 4 org.cache2k.core.HeapCache::insertNewEntry, version 913
> 9.38% 7.13% libjvm.so SpinPause
> 9.13% 9.54% C2, level 4 org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub, version 873
> 5.00% 3.88% libjvm.so _ZN13InstanceKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
> 4.54% 4.70% perf-5104.map [unknown]
> 3.57% 3.86% C2, level 4 org.cache2k.core.AbstractEviction::removeFromHashWithoutListener, version 838
> 2.86% 15.40% libjvm.so _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
> 2.48% 12.53% libjvm.so _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
> 2.46% 1.72% C2, level 4 org.cache2k.core.AbstractEviction::refillChunk, version 906
> 2.31% 3.33% libjvm.so _ZN18PSPromotionManager22copy_to_survivor_spaceILb0EEEP7oopDescS2_
> 2.24% 6.37% C2, level 4 java.util.concurrent.locks.StampedLock::acquireRead, version 864
> 2.03% 2.89% libjvm.so _ZN18PSPromotionManager18drain_stacks_depthEb
> 1.44% 1.43% libjvm.so _ZN13ObjArrayKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
> 1.29% 0.72% kernel [unknown]
> 1.03% 1.27% libjvm.so _ZN18CardTableExtension26scavenge_contents_parallelEP16ObjectStartArrayP12MutableSpaceP8HeapWordP18PSPromotionManagerjj
> 0.79% 1.53% C2, level 4 java.util.concurrent.locks.StampedLock::acquireWrite, version 865
> 0.74% 4.21% runtime stub StubRoutines::SafeFetch32
> 0.71% 0.50% C2, level 4 org.cache2k.core.ClockProPlusEviction::sumUpListHits, version 772
> 0.70% 0.39% libc-2.19.so __clock_gettime
> 3.76% 3.73% <...other 147 warm methods...>
> ....................................................................................................
> 100.00% 99.93% <totals>
>
> .../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar \\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ -XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails\ -XX:+UseG1GC -f 1 -wi 1 -w 20s -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json -prof perfasm
>
> ....[Hottest Methods (after inlining)]..............................................................
> 49.11% 41.16% libjvm.so _ZN17OtherRegionsTable13add_referenceEPvi
> 10.25% 3.37% C2, level 4 org.cache2k.core.ClockProPlusEviction::removeFromReplacementListOnEvict, version 883
> 4.93% 1.43% C2, level 4 org.cache2k.core.SegmentedEviction::submitWithoutEviction, version 694
> 4.31% 5.89% libjvm.so _ZN29G1UpdateRSOrPushRefOopClosure6do_oopEPj
> 3.18% 4.17% libjvm.so _ZN13ObjArrayKlass20oop_oop_iterate_nv_mEP7oopDescP24FilterOutOfRegionClosure9MemRegion
> 3.17% 3.00% libjvm.so _ZN29G1BlockOffsetArrayContigSpace18block_start_unsafeEPKv
> 2.95% 3.16% perf-5226.map [unknown]
> 2.19% 1.00% C2, level 4 org.cache2k.benchmark.Cache2kFactory$1::getIfPresent, version 892
> 1.58% 1.50% libjvm.so _ZN8G1RemSet11refine_cardEPajb
> 1.42% 5.02% libjvm.so _ZNK10HeapRegion12block_is_objEPK8HeapWord
> 1.41% 3.31% libjvm.so _ZN10HeapRegion32oops_on_card_seq_iterate_carefulE9MemRegionP24FilterOutOfRegionClosurebPa
> 1.13% 3.05% libjvm.so _ZN13InstanceKlass18oop_oop_iterate_nvEP7oopDescP24FilterOutOfRegionClosure
> 0.98% 0.51% libjvm.so _ZN14G1HotCardCache6insertEPa
> 0.89% 4.27% libjvm.so _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
> 0.85% 1.17% C2, level 4 org.cache2k.core.HeapCache::insertNewEntry, version 899
> 0.74% 3.59% libjvm.so _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
> 0.74% 0.57% libjvm.so _ZN20G1ParScanThreadState10trim_queueEv
> 0.70% 0.70% C2, level 4 org.cache2k.core.Hash2::remove, version 864
> 0.69% 0.81% C2, level 4 org.cache2k.core.ClockProPlusEviction::findEvictionCandidate, version 906
> 0.65% 1.59% C2, level 4 org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub, version 857
> 8.14% 10.65% <...other 331 warm methods...>
> ....................................................................................................
> 100.00% 99.91% <totals>
As I mentioned in my reply to your other mail, these calls are caused by
region to region pointers in G1. Adding those references can be done
either during a safepoint or concurrently. Looking at your profile it
seems that most calls come from the concurrent path and since your
system has few cores having the concurrent refinement threads doing a
lot of work will have impact on the over all performance more.
Stefan
> Best,
>
> Jens
>
More information about the hotspot-gc-use
mailing list