RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah

Tue Dec 20 10:57:21 UTC 2016

Hi,

Since we care mostly about pause times, and not the raw throughput, it makes
sense to enable safepoints in counted loops. This makes us much more responsive
(as in, TTSP is lower) in many interesting scenarios.

Change:
  http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.01/

The easiest example that is present in any workload of interest is looping
through a large array/ArrayList.

SPECjvm2008 throughput does appear affected where tight loops are present:

Benchmark                Mode  Cnt      Score    Error    Units

# -XX:-UseCountedLoopSafepoints
Compiler.compiler       thrpt   30    217.169 ±  5.166  ops/min
Compiler.sunflow        thrpt   30    473.940 ± 20.246  ops/min
Compress.test           thrpt   15    647.552 ±  3.528  ops/min
CryptoAes.test          thrpt   15     44.367 ±  2.402  ops/min
CryptoRsa.test          thrpt   15   2066.495 ± 11.809  ops/min
CryptoSignVerify.test   thrpt   15  10372.019 ± 50.713  ops/min
Derby.test              thrpt   30    375.954 ± 13.539  ops/min
MpegAudio.test          thrpt   15    197.299 ±  2.411  ops/min
ScimarkFFT.large        thrpt   15     55.618 ±  0.142  ops/min
ScimarkFFT.small        thrpt   15    664.370 ±  7.304  ops/min
ScimarkLU.large         thrpt   15     14.767 ±  0.082  ops/min
ScimarkLU.small         thrpt   15    926.435 ±  8.790  ops/min
ScimarkMonteCarlo.test  thrpt   15   4508.333 ± 68.869  ops/min
ScimarkSOR.large        thrpt   15     74.596 ±  0.052  ops/min
ScimarkSOR.small        thrpt   15    466.186 ±  1.308  ops/min
ScimarkSparse.large     thrpt   15     48.932 ± 11.991  ops/min
ScimarkSparse.small     thrpt   15    360.907 ±  6.739  ops/min
Serial.test             thrpt   30   8779.857 ± 77.717    ops/s
Sunflow.test            thrpt   15    124.546 ±  2.110  ops/min
XmlTransform.test       thrpt   20    429.422 ± 24.964  ops/min
XmlValidation.test      thrpt   30    773.254 ±  8.561  ops/min

# -XX:+UseCountedLoopSafepoints
Compiler.compiler       thrpt   20    213.199 ±  8.146  ops/min
Compiler.sunflow        thrpt   27    486.745 ± 21.118  ops/min
Compress.test           thrpt   15    637.303 ±  4.800  ops/min <---  -1.5%
CryptoAes.test          thrpt   15     46.943 ±  0.345  ops/min
CryptoRsa.test          thrpt   15   2042.072 ± 12.379  ops/min <---  -1.1%
CryptoSignVerify.test   thrpt   15  10240.459 ± 63.095  ops/min
Derby.test              thrpt   30    406.943 ± 12.625  ops/min
MpegAudio.test          thrpt   15    193.173 ±  1.414  ops/min
ScimarkFFT.large        thrpt   15     55.629 ±  0.104  ops/min
ScimarkFFT.small        thrpt   15    669.153 ±  6.683  ops/min
ScimarkLU.large         thrpt   15     13.510 ±  0.075  ops/min <---  -8.5%
ScimarkLU.small         thrpt   15    581.737 ±  6.539  ops/min <--- -37.3%
ScimarkMonteCarlo.test  thrpt   15   4485.049 ± 11.864  ops/min
ScimarkSOR.large        thrpt   15     74.594 ±  0.045  ops/min
ScimarkSOR.small        thrpt   15    421.046 ±  0.456  ops/min <---  -9.6%
ScimarkSparse.large     thrpt   15     40.995 ±  0.283  ops/min
ScimarkSparse.small     thrpt   15    319.079 ±  1.391  ops/min <--- -11.3%
Serial.test             thrpt   30   8717.823 ± 81.147    ops/s
Sunflow.test            thrpt   15    127.221 ±  1.578  ops/min
XmlTransform.test       thrpt   20    445.762 ±  8.278  ops/min
XmlValidation.test      thrpt   30    760.121 ±  9.963  ops/min

Note that Scimark are expected to regress that much: they do have very tight
loops, and that's our problem: the TTSP times there are in multi-second range!
The difference is explained by different code generation. For example, in most
dramatic ScimarkLU.small case:

Hottest loop uses AVX2 (vmovdqu and friends):

http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-minus.perfasm

Hottest loop uses AVX (vmovsd and friends):

http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-plus.perfasm

As such, I believe enabling this by default, and figuring out code quality
issues as we go forward is the sane tactics.

Thanks,
-Aleksey