RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
Aleksey Shipilev
shade at redhat.com
Tue Dec 20 10:57:21 UTC 2016
Hi,
Since we care mostly about pause times, and not the raw throughput, it makes
sense to enable safepoints in counted loops. This makes us much more responsive
(as in, TTSP is lower) in many interesting scenarios.
Change:
http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.01/
The easiest example that is present in any workload of interest is looping
through a large array/ArrayList.
SPECjvm2008 throughput does appear affected where tight loops are present:
Benchmark Mode Cnt Score Error Units
# -XX:-UseCountedLoopSafepoints
Compiler.compiler thrpt 30 217.169 ± 5.166 ops/min
Compiler.sunflow thrpt 30 473.940 ± 20.246 ops/min
Compress.test thrpt 15 647.552 ± 3.528 ops/min
CryptoAes.test thrpt 15 44.367 ± 2.402 ops/min
CryptoRsa.test thrpt 15 2066.495 ± 11.809 ops/min
CryptoSignVerify.test thrpt 15 10372.019 ± 50.713 ops/min
Derby.test thrpt 30 375.954 ± 13.539 ops/min
MpegAudio.test thrpt 15 197.299 ± 2.411 ops/min
ScimarkFFT.large thrpt 15 55.618 ± 0.142 ops/min
ScimarkFFT.small thrpt 15 664.370 ± 7.304 ops/min
ScimarkLU.large thrpt 15 14.767 ± 0.082 ops/min
ScimarkLU.small thrpt 15 926.435 ± 8.790 ops/min
ScimarkMonteCarlo.test thrpt 15 4508.333 ± 68.869 ops/min
ScimarkSOR.large thrpt 15 74.596 ± 0.052 ops/min
ScimarkSOR.small thrpt 15 466.186 ± 1.308 ops/min
ScimarkSparse.large thrpt 15 48.932 ± 11.991 ops/min
ScimarkSparse.small thrpt 15 360.907 ± 6.739 ops/min
Serial.test thrpt 30 8779.857 ± 77.717 ops/s
Sunflow.test thrpt 15 124.546 ± 2.110 ops/min
XmlTransform.test thrpt 20 429.422 ± 24.964 ops/min
XmlValidation.test thrpt 30 773.254 ± 8.561 ops/min
# -XX:+UseCountedLoopSafepoints
Compiler.compiler thrpt 20 213.199 ± 8.146 ops/min
Compiler.sunflow thrpt 27 486.745 ± 21.118 ops/min
Compress.test thrpt 15 637.303 ± 4.800 ops/min <--- -1.5%
CryptoAes.test thrpt 15 46.943 ± 0.345 ops/min
CryptoRsa.test thrpt 15 2042.072 ± 12.379 ops/min <--- -1.1%
CryptoSignVerify.test thrpt 15 10240.459 ± 63.095 ops/min
Derby.test thrpt 30 406.943 ± 12.625 ops/min
MpegAudio.test thrpt 15 193.173 ± 1.414 ops/min
ScimarkFFT.large thrpt 15 55.629 ± 0.104 ops/min
ScimarkFFT.small thrpt 15 669.153 ± 6.683 ops/min
ScimarkLU.large thrpt 15 13.510 ± 0.075 ops/min <--- -8.5%
ScimarkLU.small thrpt 15 581.737 ± 6.539 ops/min <--- -37.3%
ScimarkMonteCarlo.test thrpt 15 4485.049 ± 11.864 ops/min
ScimarkSOR.large thrpt 15 74.594 ± 0.045 ops/min
ScimarkSOR.small thrpt 15 421.046 ± 0.456 ops/min <--- -9.6%
ScimarkSparse.large thrpt 15 40.995 ± 0.283 ops/min
ScimarkSparse.small thrpt 15 319.079 ± 1.391 ops/min <--- -11.3%
Serial.test thrpt 30 8717.823 ± 81.147 ops/s
Sunflow.test thrpt 15 127.221 ± 1.578 ops/min
XmlTransform.test thrpt 20 445.762 ± 8.278 ops/min
XmlValidation.test thrpt 30 760.121 ± 9.963 ops/min
Note that Scimark are expected to regress that much: they do have very tight
loops, and that's our problem: the TTSP times there are in multi-second range!
The difference is explained by different code generation. For example, in most
dramatic ScimarkLU.small case:
Hottest loop uses AVX2 (vmovdqu and friends):
http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-minus.perfasm
Hottest loop uses AVX (vmovsd and friends):
http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-plus.perfasm
As such, I believe enabling this by default, and figuring out code quality
issues as we go forward is the sane tactics.
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list