[vectorIntrinsics+mask] RFR: 8264563: Add masked vector intrinsics for binary/store operations [v2]

Thu Apr 8 23:56:21 UTC 2021

On Wed, 7 Apr 2021 09:00:07 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Yeah, that's true. How about handling the difference separately in the masked/non-masked `lanewise` method?
>
> My initial thought is to add a new abstract method i.g. `lanewise0()`. It did the same transformations for special opcodes and then directly calls the intrinsic method `binaryMaskOp`. All the different handles for special ops are handled differently in the `lanewise` method. 
> The codes might look like:
>    /**
>      * {@inheritDoc} 
>      * @see #lanewise(VectorOperators.Binary,byte)
>      * @see #lanewise(VectorOperators.Binary,byte,VectorMask)
>      */
>     @ForceInline
>     public final
>     ByteVector lanewise(VectorOperators.Binary op,
>                                   Vector<Byte> v) {
>         ByteVector that = (ByteVector) v;
>         that.check(this);
>         if (op == ROR || op == ROL) { // FIXME: JIT should do this
>             ByteVector neg = that.lanewise(NEG);
>             ByteVector hi = this.lanewise(LSHL, (op == ROR) ? neg : that);
>             ByteVector lo = this.lanewise(LSHR, (op == ROR) ? that : neg);
>             return hi.lanewise(OR, lo);
>         } else if (op == DIV) {
>             VectorMask<Byte> eqz = that.eq((byte)0);
>             if (eqz.anyTrue()) {
>                 throw that.divZeroException();
>             }
>         }
>         return lanewise0(op, that, null);
>     }
> 
>     /**
>      * {@inheritDoc} 
>      * @see #lanewise(VectorOperators.Binary,byte,VectorMask)
>      */
>     @Override
>     public final
>     ByteVector lanewise(VectorOperators.Binary op,
>                                   Vector<Byte> v,
>                                   VectorMask<Byte> m) {
>         ByteVector that = (ByteVector) v;
>         that.check(this);
>         if (op == ROR || op == ROL) {
>             return blend(lanewise(op, v), m);
>         } else if (op == DIV) {
>             VectorMask<Byte> eqz = that.eq((byte)0);
>             if (eqz.and(m).anyTrue()) {
>                 throw that.divZeroException();
>             }
>             // suppress div/0 exceptions in unset lanes
>             that = that.lanewise(NOT, eqz);
>         }
>         return lanewise0(op, that, m);
>     }
> 
>     protected abstract
>     ByteVector lanewise0(VectorOperators.Binary op,
>                          Vector<Byte> v,
>                          VectorMask<Byte> m);
> 
>     @ForceInline
>     final
>     ByteVector lanewise0Template(VectorOperators.Binary op,
>                                           Class<? extends VectorMask<Byte>> maskType,
>                                           Vector<Byte> v, VectorMask<Byte> m) {
>         ByteVector that = (ByteVector) v;
>         if (opKind(op, VO_SPECIAL  | VO_SHIFT)) {
>             if (op == FIRST_NONZERO) {
>                 // FIXME: Support this in the JIT.
>                 VectorMask<Byte> thisNZ
>                         = this.viewAsIntegralLanes().compare(NE, (byte) 0);
>                 that = that.blend((byte) 0, thisNZ.cast(vspecies()));
>                 op = OR_UNCHECKED;
>             }
>             if (opKind(op, VO_SHIFT)) {
>                 // As per shift specification for Java, mask the shift count.
>                 // This allows the JIT to ignore some ISA details.
>                 that = that.lanewise(AND, SHIFT_MASK);
>             }
>             if (op == AND_NOT) {
>                 // FIXME: Support this in the JIT.
>                 that = that.lanewise(NOT);
>                 op = AND;
>             }
>         }
> 
>         int opc = opCode(op);
>         return VectorSupport.binaryMaskOp(
>             opc, getClass(), maskType, byte.class, length(),
>             this, that, m,
>             BIN_MASK_IMPL.find(op, opc, (opc_) -> {
>               switch (opc_) {
>                 case VECTOR_OP_ADD: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a + b));
>                 case VECTOR_OP_SUB: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a - b));
>                 case VECTOR_OP_MUL: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a * b));
>                 case VECTOR_OP_DIV: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a / b));
>                 case VECTOR_OP_MAX: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)Math.max(a, b));
>                 case VECTOR_OP_MIN: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)Math.min(a, b));
>                 case VECTOR_OP_AND: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a & b));
>                 case VECTOR_OP_OR: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a | b));
>                 case VECTOR_OP_XOR: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, b) -> (byte)(a ^ b));
>                 case VECTOR_OP_LSHIFT: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, n) -> (byte)(a << n));
>                 case VECTOR_OP_RSHIFT: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, n) -> (byte)(a >> n));
>                 case VECTOR_OP_URSHIFT: return (v0, v1, vm) ->
>                         v0.bOp(v1, vm, (i, a, n) -> (byte)((a & LSHR_SETUP_MASK) >>> n));
>                 default: return null;
>                 }}));
>     }
> The additional benefit for the masked DIV is that it only needs one "div zero checking" for the active elements with this change, while it checks twice with the original (`blend`) codes.  The original codes firstly check the active elements (calls by masked lanewise), and then check all the elements (calls by non-masked lanewise) which I think is duplicate.
> 
> Does it look ok for you? Thanks!
> 
> Best Regards,
> Xiaohong

This looks good to me. The ForceInline annotation needs to be added to the following method.
ByteVector lanewise(VectorOperators.Binary op,
                                  Vector<Byte> v,
                                  VectorMask<Byte> m)

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/57