[vector] binary operations with a scalar

Tue Feb 20 17:19:17 UTC 2018

> On Feb 19, 2018, at 7:18 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Paul,
> 
>> Here is an initial step towards this focusing on the add operation for IntVector:
>>   http://cr.openjdk.java.net/~psandoz/panama/bin-op-with-scalar/webrev/
>> I don’t wanna proceed further until we agree on the direction/optimization strategy.
>> If we want to elide explicit broadcasts (hidden in the implementation) we would need to push this down into the intrinsics. This would require 6 additional intrinsic methods on VectorIntrinsics.
> 
> I like proposed API methods. Ability to hide broadcasts behind the scenes definitely improve the API.
> 
> Regarding implementation, from JVM/JIT perspective, IMO there's no compelling reason to intrinsify those methods.

Ok, I think Sandya was implying otherwise, but perhaps it really comes down to how one gets the species for performing the broadcast?

> As you noted, new methods can be implemented by composing existing operations (broadcast + add/sub/...) and JVM has to do the same if it intrinsifies them.
> 
> The only difference between composing vs intrinsifying is what happens when intrinsification fails. I share your desire to provide optimized implementation for non-intrinsified case, but don't think it justifies intrinsification.
> 

Right, i was not using that as a justification for a new intrinsic, i was more concerned about the case of a separate intrinsic broadcast + intrinsic op combination being harder to optimize. The ability to optimize the non-intrinsic combination case was an opportunistic move if such a combined intrinsic was available.

> What do you think about the following alternative?
> 
>    @Override
>    @ForceInline
>    public Int256Vector add(int o) {
>        Int256Vector v = Int256Species.broadcastImpl(o);
>        return (Int256Vector) VectorIntrinsics.binaryOp(
>                VECTOR_OP_ADD, Int256Vector.class, int.class, LENGTH,
>                this, v,
>                (v1, __) -> v1.bOp(o, (i, a, b) -> (int)(a + b)));
>    }
> 
>    static final class Int256Species extends ... {
> ...
>        @ForceInline
>        static Int256Vector broadcastImpl(int e) {
>            return VectorIntrinsics.broadcastCoerced(
>                    Int256Vector.class, int.class, LENGTH,
>                    e,
>                    ((long bits) -> SPECIES.op(i -> (int)bits)));
>        }
> 
>        @Override
>        @ForceInline
>        public Int256Vector broadcast(int e) {
>            return broadcastImpl(e);
>        }
> ...
>    }
> 
> All vector shape information is available, so both operations can be reliably intrinsified: e.g., no need to dispatch through vector species to broadcast the scalar.
> 
> If VI.binaryOp can't be intrinsified, default implementation is specialized and doesn't use broadcasted vector, so C2 should be able to get rid of broadcast operation most of the time. (The only case I see when it's not possible is when there's a safepoint in between, which IMO shouldn't be a problem in practice.)
> 
> Another peculiarity is that default implementation is now represented with capturing lambda which should be allocated on every invocation. But if proper inlining happens, C2 should eliminate the allocation as well.
> 

I like this but suggest we keep the code marginally simpler and revisit if we have performance concerns. If we don’t make intrinsic the broadcast+op combination i think something like this is probably ok for now:

   @Override
   @ForceInline
   public Int256Vector add(int o) {
       return add(Int256Species.broadcastImpl(o));
   }

?

Paul.

> Best regards,
> Vladimir Ivanov
> 
>> The current approach may not be optimal performance-wise leveraging existing intrinsics and is not optimal from the pure Java perspective either.
>> Thanks,
>> Paul.