Performance regression with IntStream.parallel.sum?

Tue Oct 29 04:15:17 PDT 2013

On Oct 28, 2013, at 6:01 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:

> On Oct 28, 2013, at 5:15 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
>> 
>>> 
>>> Hmmm. Quite strange. Have to evaluate it.
>>> 
>> 
>> Doh <thump> head hits desk. I forgot that vm flags were not propagated via the options builder to the forked java process:
>> 
>>               .jvmArgs("-XX:-TieredCompilation -Dbenchmark.n=" + n)
>> 
>> grrr... sorry for the noise. Re-running...
>> 
> 
> N = 100_000
> Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
> l.StreamSumTest.testStreamPar     avgt   1    100    1       39.105        0.317    us/op
> l.StreamSumTest.testStreamSeq     avgt   1    100    1      486.373        1.516    us/op
> 
> 
> N = 1_000_000
> Benchmark                         Mode Thr    Cnt  Sec         Mean   Mean error    Units
> l.StreamSumTest.testStreamPar     avgt   1    100    1      174.094        8.515    us/op
> l.StreamSumTest.testStreamSeq     avgt   1    100    1     4877.512       18.542    us/op
> 
> 
> Now i am suspicious of the sequential numbers :-) While i would like to believe them my laptop has only eight hardware threads so 12x and 28x speed ups are highly suspicious.
> 
> When looking at the sequential iterations (see below) i notice a slow down which kicks in after a number of iterations

On further investigation the JIT compiler is kicking in on stream construction related methods at a later point, which for sequential evaluation is having a negative effect (the jmh "-prof hs_comp" and HotSpot -XX:+PrintCompilation options are very handy in combination with a smaller sample time and increased iterations to better observe when the jump occurs and correlate with HotSpot activity, also -XX:CompileThreshold was useful as well).

Using the following compiler options:

-XX:-TieredCompilation -XX:CompileCommandFile=.hotspot_compiler 

$ cat .hotspot_compiler 
exclude java/util/stream/AbstractPipeline evaluate

I now get this result:

Benchmark                             Mode Thr    Cnt  Sec         Mean   Mean error    Units
o.m.s.ForLoopSum100K.sequential       avgt   1     20    1       43.097        0.115    us/op
o.m.s.IntStreamSum100K.parallel       avgt   1     20    1       40.090        0.892    us/op
o.m.s.IntStreamSum100K.sequential     avgt   1     20    1       45.711        0.136    us/op
o.m.s.IntStreamSum1M.parallel         avgt   1     20    1      153.193        3.281    us/op
o.m.s.IntStreamSum1M.sequential       avgt   1     20    1      453.525        1.135    us/op
o.m.s.IntStreamSum5M.parallel         avgt   1     20    1      863.744        6.092    us/op
o.m.s.IntStreamSum5M.sequential       avgt   1     20    1     2354.732       11.270    us/op

which is much more reasonable.

Why did i choose to exclude AbstractPipeline.evaluate from compilation? there is a HotSpot related bug associated with that method. Perhaps it is just coincidence, or just the "age of aquarius" :-) I have yet to try excluding other methods. However, it does suggest there might be some errant behaviour in the HotSpot compiler.

Paul.