[vectorIntrinsics] RFR: RFC: Vector API masking support proposal for Arm SVE [v3]

Jatin Bhateja jbhateja at openjdk.java.net
Thu Mar 18 11:29:48 UTC 2021


On Fri, 12 Mar 2021 09:47:33 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Please help to review this proposal for Vector API masking support. 
>> 
>> This is the masking part of https://bugs.openjdk.java.net/browse/JDK-8261663 - the second incubator for JEP 338 Vector API. As the JEP described, the masked vector operations are currently implemented by composing the non-masked operation with a blend operation. This can be improved by using the hardware mask feature on supported architecture like Arm SVE and X86 AVX-512. So here is the proposals for Arm SVE.  We assume the ideas could also be applied to X86 AVX-512.
>> 
>> To support the masking feature, this PR added the following implementations:
>>   -  SVE predicate register allocation
>>   -  Mask type and basic mask IR definition
>>   -  Mask implementation for masked vector store
>>   -  Mask implementation for masked binary operations
>> 
>> For the masked binary operations, we have created two proposals for discussion:
>>   -  By mainly changing the C2 compiler
>>   -  By improving the Vector API Java implementation together with simpler C2 compiler changes
>>   
>> This PR shows the second solution since we think this solution is better.  But we also have a prototype for the first solution. Please see: https://github.com/XiaohongGong/panama-vector/commit/372489feeae06bc53c46709d389cb0e46e9fb4f6 . The basic support changes are shared with this PR.
>> 
>> This PR doesn't contain all the masking support changes. There are still too many missing parts that we will continue working on, including:
>>   -  Mask support for other operations (unary,ternary,reduction,load,etc.)
>>   -  More mask IRs implementation (maskAll, toVector, allTrue, anyTrue, trueCount, eq, etc)
>>   -  Better solution for vector mask load/store  (the memory type is boolean)
>>   -  Vector boxing/unboxing support for mask type (deoptimization?)
>>   -  Tail loop elimination?
>> 
>> It's worth to mention that this PR mainly provides the proposals for SVE masking support, and any suggestions and discussions are welcome! Thanks a lot!
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add mask support for masked binary operations

src/hotspot/share/opto/vectorIntrinsics.cpp line 583:

> 581:       const TypeVMask* vmask_type = TypeVMask::make(elem_bt, num_elem);
> 582:       mask = gvn().transform(new VectorToMaskNode(mask, vmask_type));
> 583:       operation->add_req(mask);

We are adding a new input to a binary operation with existing opcodes
Eg. following will be graph shapes with or w/o mask node.
   AddVI SRC1 SRC2
   AddVI SRC1 SRC2 MASK

We may be able to curtail the number of instruction patterns for masked operation by folding blend + vector operation into a new VectorMaskedOperNode.  Effectively this way we may be able to remove lots of redundant instruction patterns from the AD files, which has been a cause of concern for us(X86) in past.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/40


More information about the panama-dev mailing list