RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v3]
Xiaohong Gong
xgong at openjdk.org
Thu Aug 25 03:52:28 UTC 2022
On Thu, 25 Aug 2022 03:40:21 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>> For a vector mask unbox operation, if the input is a `VectorBox`, and the boxed value type is not equal to the unbox type, we need to do the type casting. Originally, we separate the type casting for two different cases:
>> 1. if the vector element size in bytes is equal (e.g. casting between `int` and `float`), the conversion is:
>>
>> VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
>>
>> This doesn't need to emit any instructions.
>>
>> 2. otherwise (e.g. casting from `short` to `int`), the conversion is:
>>
>> VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)
>>
>> This means we need to convert the short mask vector back to a boolean vector, and then casting the boolean vector to a int vector (i.e. `S->B->I`).
>>
>> With the new changes in this PR, `VectorMaskCast` supports the vector mask casting for all cases and all platforms (originally it only supports the first case for most platforms), we can implement the second cases simply with `VectorMaskCast`. That means we can optimize `S->B->I` to `S->I`. This saves the unnecessary narrowing and extending instructions due to the temporary boolean vector result.
>
>> For a vector mask unbox operation, if the input is a `VectorBox`, and the boxed value type is not equal to the unbox type, we need to do the type casting. Originally, we separate the type casting for two different cases:
>>
>> 1. if the vector element size in bytes is equal (e.g. casting between `int` and `float`), the conversion is:
>>
>> ```
>> VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
>> ```
>>
>> This doesn't need to emit any instructions.
>>
>> 2. otherwise (e.g. casting from `short` to `int`), the conversion is:
>>
>> ```
>> VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)
>> ```
>>
>> This means we need to convert the short mask vector back to a boolean vector, and then casting the boolean vector to a int vector (i.e. `S->B->I`).
>>
>> With the new changes in this PR, `VectorMaskCast` supports the vector mask casting for all cases and all platforms (originally it only supports the first case for most platforms), we can implement the second cases simply with `VectorMaskCast`. That means we can optimize `S->B->I` to `S->I`. This saves the unnecessary narrowing and extending instructions due to the temporary boolean vector result.
>
>
>
> So there are at least two optimizations in this pr:
> 1) Opt for FIRST_NONZERO
> 2) Opt for vector mask unbox operation
>
> How about splitting them and doing the second one in a new PR?
Thanks for the advice! Yes, this patch firstly fix the performance issue for FIRST_NONZERO and together with some clean up for the related codes. I considered to split the changes before. However, to think more, if the patch only fixes the performance issue, the final codes will look mess and we need to add the additional complex IR checks for the vector mask cast (please see the initial fixing changes here https://openjdk.github.io/cr/?repo=jdk&pr=9737&range=00). Unifying the usage to `VectorMaskCast` IR makes the final code clean. So I prefer to make the whole changes together.
-------------
PR: https://git.openjdk.org/jdk/pull/9737
More information about the hotspot-compiler-dev
mailing list