Performance regression with IntStream.parallel.sum?

Paul Sandoz paul.sandoz at oracle.com
Mon Oct 28 07:22:25 PDT 2013


Hi Sergey,

On Oct 28, 2013, at 1:46 PM, Sergey Kuksenko <sergey.kuksenko at oracle.com> wrote:

> Hi All,
> The reason of such behavoir is TieredCompilation which was turned off by
> default in b92 and turned on in b112.
> Here is some data, under jmh (average time per op; uses/op)
> 
> 	b92_NonTiered	b92_Tiered	b112_NonTiered	b112_Tiered
> 1M_seq	1568		5294		1561		13347
> 1M_par	869		12770		802		7044
> 5M_seq	7673		7634		7630		7570
> 5M_par	4042		19789		3670		33147
> 
> Moreover, TieredCompilation causes a huge run-to-run variance here,
> expecially for parallel cases.
> You may find my sources and jar files here:
> http://cr.openjdk.java.net/~skuksenko/intstream/tiered/
> 

Thanks. The src directory is empty.

Here are the results for my test with (adjusting based on your advice off list):

        Options opts = new OptionsBuilder()
                .include(".*StreamSumTest.*")
                .jvmArgs("-Dbenchmark.n=" + n)
                .mode(Mode.AverageTime)
                .timeUnit(TimeUnit.NANOSECONDS)
                .warmupIterations(10)
                .warmupTime(TimeValue.milliseconds(1000))
                .measurementIterations(10)
                .measurementTime(TimeValue.milliseconds(1000))
                .forks(4)
                .build();


-XX:-TieredCompilation, N = 100_000
Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
l.StreamSumTest.testSeq           avgt   1     40    1    43509.309      114.487  nsec/op
l.StreamSumTest.testStreamPar     avgt   1     40    1   108882.854    29918.402  nsec/op
l.StreamSumTest.testStreamSeq     avgt   1     40    1   499711.104     1382.872  nsec/op


-XX:-TieredCompilation, N = 1_000_000
Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
l.StreamSumTest.testSeq           avgt   1     40    1   443011.329      902.065  nsec/op
l.StreamSumTest.testStreamPar     avgt   1     40    1  1565053.123    17028.800  nsec/op
l.StreamSumTest.testStreamSeq     avgt   1     40    1   467933.044     1074.794  nsec/op


-XX:+TieredCompilation, N = 100_000
Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
l.StreamSumTest.testSeq           avgt   1     40    1    43533.608       82.619  nsec/op
l.StreamSumTest.testStreamPar     avgt   1     40    1   165477.693     7557.480  nsec/op
l.StreamSumTest.testStreamSeq     avgt   1     40    1   498996.086     1013.645  nsec/op

-XX:+TieredCompilation, N = 1_000_000
Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
l.StreamSumTest.testSeq           avgt   1     40    1   443919.982      969.729  nsec/op
l.StreamSumTest.testStreamPar     avgt   1     40    1  1567403.736    18983.551  nsec/op
l.StreamSumTest.testStreamSeq     avgt   1     40    1  1595995.987   847523.680  nsec/op


So i am still observing a drop in parallel performance going from N=10^5 to N=10^6.

Paul.


More information about the lambda-dev mailing list