RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
Roman Kennke
rkennke at redhat.com
Tue Dec 20 11:02:11 UTC 2016
Am Dienstag, den 20.12.2016, 11:57 +0100 schrieb Aleksey Shipilev:
> Hi,
>
> Since we care mostly about pause times, and not the raw throughput,
> it makes
> sense to enable safepoints in counted loops. This makes us much more
> responsive
> (as in, TTSP is lower) in many interesting scenarios.
>
> Change:
> http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.0
> 1/
>
> The easiest example that is present in any workload of interest is
> looping
> through a large array/ArrayList.
>
> SPECjvm2008 throughput does appear affected where tight loops are
> present:
>
> Benchmark Mode Cnt Score Error Units
>
> # -XX:-UseCountedLoopSafepoints
> Compiler.compiler thrpt 30 217.169 ± 5.166 ops/min
> Compiler.sunflow thrpt 30 473.940 ± 20.246 ops/min
> Compress.test thrpt 15 647.552 ± 3.528 ops/min
> CryptoAes.test thrpt 15 44.367 ± 2.402 ops/min
> CryptoRsa.test thrpt 15 2066.495 ± 11.809 ops/min
> CryptoSignVerify.test thrpt 15 10372.019 ± 50.713 ops/min
> Derby.test thrpt 30 375.954 ± 13.539 ops/min
> MpegAudio.test thrpt 15 197.299 ± 2.411 ops/min
> ScimarkFFT.large thrpt 15 55.618 ± 0.142 ops/min
> ScimarkFFT.small thrpt 15 664.370 ± 7.304 ops/min
> ScimarkLU.large thrpt 15 14.767 ± 0.082 ops/min
> ScimarkLU.small thrpt 15 926.435 ± 8.790 ops/min
> ScimarkMonteCarlo.test thrpt 15 4508.333 ± 68.869 ops/min
> ScimarkSOR.large thrpt 15 74.596 ± 0.052 ops/min
> ScimarkSOR.small thrpt 15 466.186 ± 1.308 ops/min
> ScimarkSparse.large thrpt 15 48.932 ± 11.991 ops/min
> ScimarkSparse.small thrpt 15 360.907 ± 6.739 ops/min
> Serial.test thrpt 30 8779.857 ± 77.717 ops/s
> Sunflow.test thrpt 15 124.546 ± 2.110 ops/min
> XmlTransform.test thrpt 20 429.422 ± 24.964 ops/min
> XmlValidation.test thrpt 30 773.254 ± 8.561 ops/min
>
> # -XX:+UseCountedLoopSafepoints
> Compiler.compiler thrpt 20 213.199 ± 8.146 ops/min
> Compiler.sunflow thrpt 27 486.745 ± 21.118 ops/min
> Compress.test thrpt 15 637.303 ± 4.800 ops/min <
> --- -1.5%
> CryptoAes.test thrpt 15 46.943 ± 0.345 ops/min
> CryptoRsa.test thrpt 15 2042.072 ± 12.379 ops/min <
> --- -1.1%
> CryptoSignVerify.test thrpt 15 10240.459 ± 63.095 ops/min
> Derby.test thrpt 30 406.943 ± 12.625 ops/min
> MpegAudio.test thrpt 15 193.173 ± 1.414 ops/min
> ScimarkFFT.large thrpt 15 55.629 ± 0.104 ops/min
> ScimarkFFT.small thrpt 15 669.153 ± 6.683 ops/min
> ScimarkLU.large thrpt 15 13.510 ± 0.075 ops/min <
> --- -8.5%
> ScimarkLU.small thrpt 15 581.737 ± 6.539 ops/min <---
> -37.3%
> ScimarkMonteCarlo.test thrpt 15 4485.049 ± 11.864 ops/min
> ScimarkSOR.large thrpt 15 74.594 ± 0.045 ops/min
> ScimarkSOR.small thrpt 15 421.046 ± 0.456 ops/min <
> --- -9.6%
> ScimarkSparse.large thrpt 15 40.995 ± 0.283 ops/min
> ScimarkSparse.small thrpt 15 319.079 ± 1.391 ops/min <---
> -11.3%
> Serial.test thrpt 30 8717.823 ± 81.147 ops/s
> Sunflow.test thrpt 15 127.221 ± 1.578 ops/min
> XmlTransform.test thrpt 20 445.762 ± 8.278 ops/min
> XmlValidation.test thrpt 30 760.121 ± 9.963 ops/min
>
> Note that Scimark are expected to regress that much: they do have
> very tight
> loops, and that's our problem: the TTSP times there are in multi-
> second range!
> The difference is explained by different code generation. For
> example, in most
> dramatic ScimarkLU.small case:
>
> Hottest loop uses AVX2 (vmovdqu and friends):
>
> http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu
> -shenandoah-minus.perfasm
>
> Hottest loop uses AVX (vmovsd and friends):
>
> http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu
> -shenandoah-plus.perfasm
>
> As such, I believe enabling this by default, and figuring out code
> quality
> issues as we go forward is the sane tactics.
Yes. The regressions, especially in scimark.lu are bad, but as you say,
the ones that regress are also the ones that show extreme TTSP.
The patch is ok for me. Folks who prefer raw throughput and can live
with multisecond pause times can still turn the option off :-)
In the long run, we should look at strip mining the loops.
Roman
More information about the shenandoah-dev
mailing list