[vector] binary operations with a scalar

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Mon Feb 19 15:18:10 UTC 2018


Paul,

> Here is an initial step towards this focusing on the add operation for IntVector:
> 
>    http://cr.openjdk.java.net/~psandoz/panama/bin-op-with-scalar/webrev/
> 
> I don’t wanna proceed further until we agree on the direction/optimization strategy.
> 
> If we want to elide explicit broadcasts (hidden in the implementation) we would need to push this down into the intrinsics. This would require 6 additional intrinsic methods on VectorIntrinsics.

I like proposed API methods. Ability to hide broadcasts behind the 
scenes definitely improve the API.

Regarding implementation, from JVM/JIT perspective, IMO there's no 
compelling reason to intrinsify those methods. As you noted, new methods 
can be implemented by composing existing operations (broadcast + 
add/sub/...) and JVM has to do the same if it intrinsifies them.

The only difference between composing vs intrinsifying is what happens 
when intrinsification fails. I share your desire to provide optimized 
implementation for non-intrinsified case, but don't think it justifies 
intrinsification.

What do you think about the following alternative?

     @Override
     @ForceInline
     public Int256Vector add(int o) {
         Int256Vector v = Int256Species.broadcastImpl(o);
         return (Int256Vector) VectorIntrinsics.binaryOp(
                 VECTOR_OP_ADD, Int256Vector.class, int.class, LENGTH,
                 this, v,
                 (v1, __) -> v1.bOp(o, (i, a, b) -> (int)(a + b)));
     }

     static final class Int256Species extends ... {
...
         @ForceInline
         static Int256Vector broadcastImpl(int e) {
             return VectorIntrinsics.broadcastCoerced(
                     Int256Vector.class, int.class, LENGTH,
                     e,
                     ((long bits) -> SPECIES.op(i -> (int)bits)));
         }

         @Override
         @ForceInline
         public Int256Vector broadcast(int e) {
             return broadcastImpl(e);
         }
...
     }

All vector shape information is available, so both operations can be 
reliably intrinsified: e.g., no need to dispatch through vector species 
to broadcast the scalar.

If VI.binaryOp can't be intrinsified, default implementation is 
specialized and doesn't use broadcasted vector, so C2 should be able to 
get rid of broadcast operation most of the time. (The only case I see 
when it's not possible is when there's a safepoint in between, which IMO 
shouldn't be a problem in practice.)

Another peculiarity is that default implementation is now represented 
with capturing lambda which should be allocated on every invocation. But 
if proper inlining happens, C2 should eliminate the allocation as well.

Best regards,
Vladimir Ivanov

> The current approach may not be optimal performance-wise leveraging existing intrinsics and is not optimal from the pure Java perspective either.
> 
> Thanks,
> Paul.
> 


More information about the panama-dev mailing list