[vectorIntrinsics+mask] RFR: 8264563: Add masked vector intrinsics for binary/store operations [v5]
Paul Sandoz
psandoz at openjdk.java.net
Wed Apr 14 20:57:46 UTC 2021
On Tue, 13 Apr 2021 08:00:23 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Hi, this is the basic masking support PR for Vector API mask operations on platforms like SVE/AVX-512. The main codes are from [1], which contains:
>>
>> - The predicate register allocation for Arm SVE, and vector mask type definition.
>> - The basic optimization for parts of the mask operations with masking feature. It contains:
>>
>> 1. Vector API java implementation changes for masked binary/store.
>> 2. C2 compiler mid-end changes, including new vector intrinsics implementation and mask IRs.
>>
>> Note that for easier discussion, this PR only provides the changes for limited masked operations (e.g. binary/store) and the mask generations (e.g. load/compare/maskAll). We will continue working on the following missing parts:
>>
>> - Mask support for other operations (unary,ternary,reduction,load,etc.)
>> - More mask IRs implementation (and/or/xor, toVector, allTrue, anyTrue, trueCount, eq, etc)
>> - Vector boxing/unboxing support for mask type (deoptimization support for predicate registers)
>>
>> Also note that this PR doesn't contain any backend implementations. So the blend pattern will be generated as before. Regarding to the AArch64 SVE backend support, we will create a separate PR based on this one in future.
>>
>> [1] https://github.com/openjdk/panama-vector/pull/40
>>
>> See more details from:
>> http://cr.openjdk.java.net/~xgong/rfr/mask/Vector%20API%20masking%20support%20proposal%20for%20Arm%20SVE.pdf
>> http://cr.openjdk.java.net/~xgong/rfr/mask/VectorAPI%20masking%20support.pdf
>>
>> Any suggestions and discussions are welcome! Thanks a lot!
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove null checking for the mask argument in API implementation
The changes to Java code generally look ok. @iwanowww will know more than I with regards to how this all behaves with C2 e.g. using `null` or some other sentinel, such as a constant mask with all bits set?
There is an awkward set of differences between non-mask and mask for ROR etc. Although most of that would go away if support is in C2. Possibly leaving subtle differences for DIV. I imagine adjustment of other templates will be similar.
I am tempted to have a template method for both non-mask and mask. Thus specialized code (that before calling the intrinsic) may be somewhat duplicated before calling `binaryMaskOp`, using a constant for the op -> lambda function, and passing in null or the mask value.
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java line 640:
> 638: }
> 639: // suppress div/0 exceptions in unset lanes
> 640: that = that.lanewise(NOT, eqz);
I guess this is required because we don't know how the intrinsic will support masking? How can a div/0 exception can occur?
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/57
More information about the panama-dev
mailing list