Shenandoah performance problem
Attila Axt
axt at load.hu
Wed Oct 16 23:01:29 UTC 2019
Hi!
I'm comparing the performance of different GC implementations against a
low-latency project. During the initialization of the project, we load a
bunch of key-value pairs from memcached to warm up the local instance
cache. The values in memcached are serialized and zipped, so we are
unzipping and deserializing them after they are loaded. In order to
utilize all cores we use a ThreadPoolExecutor to do this operation using
multiple threads.
An interesting thing what I noticed, that the cputime burned by the
transcode pool during startup, is significantly higher in case of
Shenandoah, than the other GC algorithms:
CMS: ~29 sec
ZGC: ~36 8sec
Shenandoah: ~ 270 sec
Since the code is quite complex, I've tried to simulate roughly what is
happening by creating a microbenchmark.
You can fetch it from here: https://github.com/axt/jmh-unzip-mt
(Please note that jdk, and jdkarguments at the moment are hardcoded in
BechmarkRunner, you will need to manually edit those to get it working)
I've executed the benchmark against a freshly built jdk from here:|
$ hg clone http:||//hg||.openjdk.java.net||/shenandoah/jdk| |shenandoah|
The benchmark yields the following results:
```
# JMH version: 1.21
# VM version: JDK 14-internal, OpenJDK 64-Bit Server VM,
14-internal+0-adhoc.axt.shenandoah
# VM invoker:
/fast/shenandoah/build/linux-x86_64-server-release/images/jdk/bin/java
# Warmup: 5 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: axt.benchmark.TranscodeBenchmark.benchmark
Benchmark Mode Cnt Score Error Units
TranscodeBenchmark.benchmark avgt 10 1.098 ± 0.018 s/op
# VM options: -Xms1024m -Xmx1024m -XX:+UseConcMarkSweepGC
-XX:+CMSConcurrentMTEnabled
TranscodeBenchmark.benchmark avgt 10 1.129 ± 0.026 s/op
# VM options: -Xms1024m -Xmx1024m -XX:+UseG1GC
TranscodeBenchmark.benchmark avgt 10 1.083 ± 0.031 s/op
# VM options: -Xms1024m -Xmx1024m -XX:+UnlockExperimentalVMOptions
-XX:+UseZGC
TranscodeBenchmark.benchmark avgt 10 5.720 ± 0.219 s/op
# VM options: -Xms1024m -Xmx1024m -XX:+UnlockExperimentalVMOptions
-XX:+UseShenandoahGC
```
I've also executed it with the fastdebug build, while creating a
recording with the `perf` profiler.
Here is a screenshot from the flamegraph: https://imgur.com/3v2RDqt
Based on this, most of the extra cpu time is burned, while spinlocking
on the heap mutex, to pin the memory area while gzip calls into native code.
Do you consider this as a bug?
Thanks, in advance,
axt
More information about the shenandoah-dev
mailing list