Request for tracking down C1 optimizations: handwritten cartesian product similar to flatmap/map performance!

Fri May 30 01:06:49 UTC 2014

Hi Aggelos,

Escape Analysis may not help you because main time goes to the loop 
which only reads from arrays and does math. There is no allocations in 
the loop. If stream version produces the same loop shape, the 
performance will be the same. We have -XX:+TraceLoopOpts flag but it is 
available only in debug version of JVM.

Vladimir

On 5/29/14 11:03 AM, Andrew Haley wrote:
> Hi,
>
> On 05/29/2014 06:55 PM, Aggelos Biboudis wrote:
>
>> I would like to ask you something regarding C1 compilation (VM options:
>> -Xms769m -Xmx769m -XX:-TieredCompilation)
>
> That's C2 compilation.
>
>> of a Cartesian product stream
>> operation with the new stream API.
>> I have two versions of this computation, one handwritten and one with
>> flatmap/map. It is remarkable that these two have similar performance so I
>> would like to trace-back the JIT compilation decisions (apart from
>> inlining), and more specifically if escape analysis has any effect.
>
> Are you quite sure your numbers aren't dominated by cache misses?  Your
> data is about 40 Megabytes and it's being accessed sequentially.
>
>> I've tested the code above with -XX:-DoEscapeAnalysis and I've got the same
>> execution times, however I would like to confirm what happens.
>> Regarding inlining, only by noticing the result of PrintInlining we
>> conclude that cartSeq inlines all the nested forEachRemaining operations
>> (of of, flatmap, map), but is that the only optimization?
>
> Not if this really is C2, no.  There are many optimization passes,
> and several will be effective for this code.
>
> Andrew.
>