[vectorIntrinsics+mask] RFR: 8264563: Add masked vector intrinsics for binary/store operations

Tue Apr 6 07:56:14 UTC 2021

On Tue, 6 Apr 2021 06:26:43 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Hi, this is the basic masking support PR for Vector API mask operations on platforms like SVE/AVX-512. The main codes are from [1], which contains:
>> 
>> - The predicate register allocation for Arm SVE, and vector mask type definition.
>> - The basic optimization for parts of the mask operations with masking feature. It contains:   
>>  
>>   1. Vector API java implementation changes for masked binary/store.
>>   2. C2 compiler mid-end changes, including new vector intrinsics implementation and mask IRs.
>> 
>> Note that for easier discussion, this PR only provides the changes for limited masked operations (e.g. binary/store) and the mask generations (e.g. load/compare/maskAll). We will continue working on the following missing parts:
>> 
>> - Mask support for other operations (unary,ternary,reduction,load,etc.)
>> - More mask IRs implementation (and/or/xor, toVector, allTrue, anyTrue, trueCount, eq, etc)
>> - Vector boxing/unboxing support for mask type (deoptimization support for predicate registers)
>> 
>> Also note that this PR doesn't contain any backend implementations. So the blend pattern will be generated as before. Regarding to the AArch64 SVE backend support, we will create a separate PR based on this one in future.
>>   
>> [1] https://github.com/openjdk/panama-vector/pull/40
>> 
>> See more details from:
>> http://cr.openjdk.java.net/~xgong/rfr/mask/Vector%20API%20masking%20support%20proposal%20for%20Arm%20SVE.pdf
>> http://cr.openjdk.java.net/~xgong/rfr/mask/VectorAPI%20masking%20support.pdf
>> 
>> Any suggestions and discussions are welcome! Thanks a lot!
>
>> Hi Xiaohong,
>> 
>> Thanks a lot for this. We are almost there. I was hoping that this PR will only have the java and blend node creation in backend. No predicate register support, no new IR nodes. i.e. in this first patch Matcher::match_rule_supported_masked_vector() set as false for every architecture including SVE.
>> 
>> My thought for smooth progress steps is as follows:
>> 1) Xiaohong commits *.java and blend node creation with Matcher::match_rule_supported_masked_vector() set as false for every architecture
>> 2) Jatin commits mainline predicate register patch with x86 predicate registers
>> 3) Xiaohong commits SVE predicate register support and basic new nodes creation
>> 4) Both x86/sve work can then go on in parallel.
>> 
>> For steps 1) the code needs to be approved by Paul Sandoz and Vladimir Ivanov. This step is most straight forward, and we should be able to get it in quickly.
>> For step 2) and 3) new node definition code needs to be approved by Vladimir Ivanov. Here step 2) has been already discussed extensively. Step 3) might need some discussion and back and forth.
>> Step 4) we could request for guidance from Vladimir Ivanov as needed
>> 
>> Please let me know what you think of this or if you have alternate suggestion.
>> 
>> Best Regards,
>> Sandhya
> 
> Hi @sviswa7 ,
> 
> Thanks for your detailed suggestion for the whole progress! The steps seem reasonable for me. I will remove the RA and new added mask IRs soon. BTW, please see my questions about step 1) and 3). 
> 
>> 1) Xiaohong commits *.java and blend node creation with Matcher::match_rule_supported_masked_vector() set as false for every architecture
> 
> I agree to add this part at the first patch. However, to easier review the masking support solution, I only added the support for masked binary and store APIs. Do you think it's needed to add all the same changes for other masked APIs (unary, ternary, reduction) together in the same PR? Or adding the missing parts after the solution is high-levelly reviewed?
> 
>> 3) Xiaohong commits SVE predicate register support and basic new nodes creation
> 
> I think @nsjian will create a separate patch for the SVE predicate register part that is based on Jatin's mainline predicate register patch first.  And I will commit the new added mask IR parts based on the that after then.  This part will not contain any backend implementations. So we could parallelly work on the AVX-512/SVE backend support finally as you suggested (Step 4).
> 
> Please let me know if any comments. Thanks!
> 
> Best Regards,
> Xiaohong

Hi, the scope of this PR is shortened to the Vector API java implementation and hotspot vector intrinsic changes. It adds two new hotspot vector intrinsic methods specially for the the masked binary/store, and the hotspot implementations for them.

The new added binary mask intrinsic is used both by masked and non-masked binary operations. And the mask argument is set to be `"null"` for non-masked version. The hotspot will generate different codes due to different mask values:

 1) If the mask is null, the normal no mask controlled codes are generated.
 2) If mask is non-null, the optimized predicated codes are generated for platforms that support mask feature (i.g. SVE/AVX-512).
    Otherwise, the vector blend codes are generated like before.

And the new added store masked intrinsic is used by masked vector store. The hotspot will generate the predicated vector store codes for platforms like SVE/AVX-512, or return to the default java implementation for other platforms.

Note that since no backend codes are added and the hotspot checking for the supported masked operations is set to false for all platforms currently, this patch will always generate the blend pattern as before.

All the backend implementations and other compiler changes will be in a separate patch.

Thanks,
Xiaohong Gong

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/57