Performance regression with IntStream.parallel.sum?
Paul Sandoz
paul.sandoz at oracle.com
Mon Oct 28 10:01:40 PDT 2013
On Oct 28, 2013, at 5:15 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
>
>>
>> Hmmm. Quite strange. Have to evaluate it.
>>
>
> Doh <thump> head hits desk. I forgot that vm flags were not propagated via the options builder to the forked java process:
>
> .jvmArgs("-XX:-TieredCompilation -Dbenchmark.n=" + n)
>
> grrr... sorry for the noise. Re-running...
>
N = 100_000
Benchmark Mode Thr Cnt Sec Mean Mean error Units
l.StreamSumTest.testStreamPar avgt 1 100 1 39.105 0.317 us/op
l.StreamSumTest.testStreamSeq avgt 1 100 1 486.373 1.516 us/op
N = 1_000_000
Benchmark Mode Thr Cnt Sec Mean Mean error Units
l.StreamSumTest.testStreamPar avgt 1 100 1 174.094 8.515 us/op
l.StreamSumTest.testStreamSeq avgt 1 100 1 4877.512 18.542 us/op
Now i am suspicious of the sequential numbers :-) While i would like to believe them my laptop has only eight hardware threads so 12x and 28x speed ups are highly suspicious.
When looking at the sequential iterations (see below) i notice a slow down which kicks in after a number of iterations (perhaps proportional N) and i observed the same effect with your test program, the benchmark results for which are:
java -XX:-TieredCompilation -jar target/microbenchmarks.jar -i 10 -f 2
Benchmark Mode Thr Cnt Sec Mean Mean error Units
o.m.s.IntStreamSum100K.parallel avgt 1 20 1 40.469 1.517 us/op
o.m.s.IntStreamSum100K.sequential avgt 1 20 1 477.382 4.407 us/op
o.m.s.IntStreamSum1M.parallel avgt 1 20 1 150.988 1.855 us/op
o.m.s.IntStreamSum1M.sequential avgt 1 20 1 4124.819 392.108 us/op
o.m.s.IntStreamSum5M.parallel avgt 1 20 1 866.846 3.700 us/op
o.m.s.IntStreamSum5M.sequential avgt 1 20 1 12629.711 5837.182 us/op
Paul.
N = 100_000
# Fork: 9 of 10
# Warmup: 20 iterations, 1000 ms each
# Measurement: 10 iterations, 1000 ms each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Running: lambda.StreamSumTest.testStreamSeq
# Warmup Iteration 1: 89.432 us/op
# Warmup Iteration 2: 484.337 us/op
# Warmup Iteration 3: 494.509 us/op
# Warmup Iteration 4: 483.470 us/op
# Warmup Iteration 5: 487.811 us/op
# Warmup Iteration 6: 485.572 us/op
# Warmup Iteration 7: 489.385 us/op
# Warmup Iteration 8: 488.314 us/op
# Warmup Iteration 9: 493.298 us/op
# Warmup Iteration 10: 497.965 us/op
# Warmup Iteration 11: 483.907 us/op
# Warmup Iteration 12: 494.186 us/op
# Warmup Iteration 13: 492.135 us/op
# Warmup Iteration 14: 486.906 us/op
# Warmup Iteration 15: 492.756 us/op
# Warmup Iteration 16: 494.186 us/op
# Warmup Iteration 17: 494.272 us/op
# Warmup Iteration 18: 493.907 us/op
# Warmup Iteration 19: 495.726 us/op
# Warmup Iteration 20: 495.143 us/op
Iteration 1: 489.998 us/op
Iteration 2: 494.910 us/op
Iteration 3: 496.420 us/op
Iteration 4: 490.313 us/op
Iteration 5: 493.948 us/op
Iteration 6: 498.616 us/op
Iteration 7: 498.998 us/op
Iteration 8: 496.266 us/op
Iteration 9: 488.312 us/op
Iteration 10: 497.052 us/op
N = 1_000_000
# Fork: 9 of 10
# Warmup: 20 iterations, 1000 ms each
# Measurement: 10 iterations, 1000 ms each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Running: lambda.StreamSumTest.testStreamSeq
# Warmup Iteration 1: 548.453 us/op
# Warmup Iteration 2: 475.805 us/op
# Warmup Iteration 3: 479.079 us/op
# Warmup Iteration 4: 481.045 us/op
# Warmup Iteration 5: 513.081 us/op
# Warmup Iteration 6: 4768.633 us/op
# Warmup Iteration 7: 4810.168 us/op
# Warmup Iteration 8: 4796.000 us/op
# Warmup Iteration 9: 4744.255 us/op
# Warmup Iteration 10: 4863.646 us/op
# Warmup Iteration 11: 4778.114 us/op
# Warmup Iteration 12: 4769.581 us/op
# Warmup Iteration 13: 4750.929 us/op
# Warmup Iteration 14: 4828.577 us/op
# Warmup Iteration 15: 4739.132 us/op
# Warmup Iteration 16: 4824.240 us/op
# Warmup Iteration 17: 4822.423 us/op
# Warmup Iteration 18: 4844.222 us/op
# Warmup Iteration 19: 4777.905 us/op
# Warmup Iteration 20: 4866.481 us/op
Iteration 1: 4832.221 us/op
Iteration 2: 4813.486 us/op
Iteration 3: 4907.794 us/op
Iteration 4: 4861.257 us/op
Iteration 5: 4815.668 us/op
Iteration 6: 4840.097 us/op
Iteration 7: 4861.160 us/op
Iteration 8: 5100.909 us/op
Iteration 9: 4862.112 us/op
Iteration 10: 4863.340 us/op
More information about the lambda-dev
mailing list