Performance regression with IntStream.parallel.sum?
Paul Sandoz
paul.sandoz at oracle.com
Tue Oct 29 04:15:17 PDT 2013
On Oct 28, 2013, at 6:01 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
> On Oct 28, 2013, at 5:15 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
>>
>>>
>>> Hmmm. Quite strange. Have to evaluate it.
>>>
>>
>> Doh <thump> head hits desk. I forgot that vm flags were not propagated via the options builder to the forked java process:
>>
>> .jvmArgs("-XX:-TieredCompilation -Dbenchmark.n=" + n)
>>
>> grrr... sorry for the noise. Re-running...
>>
>
> N = 100_000
> Benchmark Mode Thr Cnt Sec Mean Mean error Units
> l.StreamSumTest.testStreamPar avgt 1 100 1 39.105 0.317 us/op
> l.StreamSumTest.testStreamSeq avgt 1 100 1 486.373 1.516 us/op
>
>
> N = 1_000_000
> Benchmark Mode Thr Cnt Sec Mean Mean error Units
> l.StreamSumTest.testStreamPar avgt 1 100 1 174.094 8.515 us/op
> l.StreamSumTest.testStreamSeq avgt 1 100 1 4877.512 18.542 us/op
>
>
> Now i am suspicious of the sequential numbers :-) While i would like to believe them my laptop has only eight hardware threads so 12x and 28x speed ups are highly suspicious.
>
> When looking at the sequential iterations (see below) i notice a slow down which kicks in after a number of iterations
On further investigation the JIT compiler is kicking in on stream construction related methods at a later point, which for sequential evaluation is having a negative effect (the jmh "-prof hs_comp" and HotSpot -XX:+PrintCompilation options are very handy in combination with a smaller sample time and increased iterations to better observe when the jump occurs and correlate with HotSpot activity, also -XX:CompileThreshold was useful as well).
Using the following compiler options:
-XX:-TieredCompilation -XX:CompileCommandFile=.hotspot_compiler
$ cat .hotspot_compiler
exclude java/util/stream/AbstractPipeline evaluate
I now get this result:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
o.m.s.ForLoopSum100K.sequential avgt 1 20 1 43.097 0.115 us/op
o.m.s.IntStreamSum100K.parallel avgt 1 20 1 40.090 0.892 us/op
o.m.s.IntStreamSum100K.sequential avgt 1 20 1 45.711 0.136 us/op
o.m.s.IntStreamSum1M.parallel avgt 1 20 1 153.193 3.281 us/op
o.m.s.IntStreamSum1M.sequential avgt 1 20 1 453.525 1.135 us/op
o.m.s.IntStreamSum5M.parallel avgt 1 20 1 863.744 6.092 us/op
o.m.s.IntStreamSum5M.sequential avgt 1 20 1 2354.732 11.270 us/op
which is much more reasonable.
Why did i choose to exclude AbstractPipeline.evaluate from compilation? there is a HotSpot related bug associated with that method. Perhaps it is just coincidence, or just the "age of aquarius" :-) I have yet to try excluding other methods. However, it does suggest there might be some errant behaviour in the HotSpot compiler.
Paul.
More information about the lambda-dev
mailing list