RFR(S): 8076284: Improve vectorization of parallel streams

Thu Apr 16 22:30:59 UTC 2015

Hi Jan,

You did not describe your changes in details (what they do).

IgnoreVectorizeMethod flag should positive and enabled by default. 
Rename it to AllowVectorizeOnDemand (or something similar):

+  product(bool, AllowVectorizeOnDemand, true, 
      \

Instead of next you should add intrinsic definition to
and classfile/vmSymbols.hpp and then check method()->intrinsic_id():

+    if (strcmp("forEachRemaining", method()->name()->as_quoted_ascii()) 
== 0 && method()->signature() != 0
+      && method()->signature()->as_symbol() != 0 && 
method()->signature()->as_symbol()->as_quoted_ascii() != 0 ) {
+      if 
(strstr(method()->signature()->as_symbol()->as_quoted_ascii(),"Ljava/util/function/IntConsumer")) 
{
+        set_do_vector_loop(true);
+      }
+    }

And that should be under flag too because in general forEachRemaining 
should be vectorized only if it is safe.

Can you also utilize changes done by Michael Berg for reduction 
optimization (the code in jdk9/hs-comp already)? I mean marking some 
nodes before unrolling and searching Phis.

Regards,
Vladimir

On 4/13/15 3:33 AM, Civlin, Jan wrote:
> Hi All,
>
>
>   We would like to contribute the improvement of vectorization of
>   parallel streams  from Intel.
>
> The contribution Bug ID: 8076284.
>
> Please review this patch:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284
>
> webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/
>
>
>       *Description*
>
> Improve vectorization of the unordered parallel streams (by vectorizing
> forEachRemaining method).
>
> For example, this forEach will be vectorized:
>
> java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0,
> RANGE - 1).parallel();
>
> iStream.forEach( id -> c[id] = c[id] + c[id+1] );
>
> It also enables on-demand loop vectorization in a given method (by
> providing more hints to SuperWord optimization).
>
> For example, use -XX:CompileCommand=option,computeCall,Vectorizeto
> vectorize this loop
>
> void computeCall(double [] Call, double  puByDf, double  pdByDf)
>
> {
>
> for(int i = timeStep; i > 0; i--)
>
> for(int j = 0; j <= i - 1; j++)
>
> Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
>
> }
>
>
> This enhancement is contributed by Intel and sponsored by the hotspot
> compiler team.
>