RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast

Tue Aug 16 02:36:13 UTC 2022

On Tue, 16 Aug 2022 01:52:52 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>>> Yes you are right, the code would be mostly the same, which means we can reuse the existing match rules to additionally match `VectorMaskCast` for those cases. For other cases, in particular, narrowing cast to subword types, since avx < 3 does not support truncation cast and only provides saturation cast instructions. We need to truncate the upper bits ourselves. For example, a cast from `int` to `byte` is done as follow
>>> 
>>> ```
>>> vpand dst, src, [external address mask]
>>> vpackusdw dst, dst
>>> vpermq dst, dst, 0x08
>>> vpackuswb dst, dst
>>> ```
>>> 
>>> For vector mask cast, we can get rid of the first masking and use the `vpackss`s instead, which removes the need to reference memory. Thanks.
>> 
>> I see, thanks! So would you like to provide the missing x86 backend implementation for `VectorMaskCast` ? If so we can use 
>> `VectorMaskCast` for all cases to simply the current codes? Thanks a lot!
>
>> > Yes you are right, the code would be mostly the same, which means we can reuse the existing match rules to additionally match `VectorMaskCast` for those cases. For other cases, in particular, narrowing cast to subword types, since avx < 3 does not support truncation cast and only provides saturation cast instructions. We need to truncate the upper bits ourselves. For example, a cast from `int` to `byte` is done as follow
>> > ```
>> > vpand dst, src, [external address mask]
>> > vpackusdw dst, dst
>> > vpermq dst, dst, 0x08
>> > vpackuswb dst, dst
>> > ```
>> > 
>> > 
>> >     
>> >       
>> >     
>> > 
>> >       
>> >     
>> > 
>> >     
>> >   
>> > For vector mask cast, we can get rid of the first masking and use the `vpackss`s instead, which removes the need to reference memory. Thanks.
>> 
>> I see, thanks! So would you like to provide the missing x86 backend implementation for `VectorMaskCast` ? If so we can use `VectorMaskCast` for all cases to simply the current codes? Thanks a lot!
> 
> Maybe I can refactor the codes in this patch, and add the same backend rules like `VectorCast`? And then you can create a followed-up patch to improve the x86 codegen if you like. WDYT?

> @XiaohongGong I have created a PR against your branch, this only contains changes in the x86 backend to avoid any conflicts with the changes you may have made, could you have a look? Thanks
> 
> [XiaohongGong#2](https://github.com/XiaohongGong/jdk/pull/2)

Sure, thanks a lot! I will take a look.

-------------

PR: https://git.openjdk.org/jdk/pull/9737