RFR: 8294865: x86: Improve the code generation of MulVB and MulVL [v3]

Vladimir Ivanov vlivanov at openjdk.org
Thu Oct 13 18:14:02 UTC 2022


On Thu, 13 Oct 2022 16:53:45 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch simplifies and improves the code generation of `MulVB` and `MulVL` nodes,
>> 
>> - MulVB can be implemented by alternating `vmullw` on odd and even-index elements and combining the results.
>> - MulVL can be implemented on non-avx512dq by computing the product of each 32-bit half and adding the results together.
>> 
>> Vector API benchmark shows the results of `MUL` operations:
>> 
>>                                                   Before                After
>>     Benchmark          (size)   Mode  Cnt      Score     Error      Score     Error   Units   Change
>>     Byte64Vector.MUL     1024  thrpt   15   8948.607 ± 194.646   8860.404 ± 203.109  ops/ms   -0.99%
>>     Byte128Vector.MUL    1024  thrpt   15  12915.839 ± 291.262  13554.662 ± 488.695  ops/ms   +4.95%
>>     Byte256Vector.MUL    1024  thrpt   15  12129.959 ± 245.710  23279.276 ± 669.725  ops/ms  +91.92%
>>     Long128Vector.MUL    1024  thrpt   15   1183.663 ±  36.440   1489.892 ±  35.356  ops/ms  +25.87%
>>     Long256Vector.MUL    1024  thrpt   15   1911.802 ±  95.304   2834.088 ±  77.647  ops/ms  +48.24%
>> 
>> Please have a look and have some reviews, thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge branch 'master' into improveMulVB
>  - refactor conditions
>  - add vmulB for 8 bytes
>  - Merge branch 'master' into improveMulVB
>  - Merge branch 'master' into improveMulVB
>  - Merge branch 'master' into improveMulVB
>  - fix
>  - mulV

Looks good.

Marked as reviewed by vlivanov (Reviewer).

src/hotspot/cpu/x86/matcher_x86.hpp line 195:

> 193:         return 0;
> 194:       case Op_MulVB:
> 195:         return 7;

Why do you unconditionally return `7` here? Is it because AVX512 doesn't feature a vector instruction to multiply byte vectors?

-------------

PR: https://git.openjdk.org/jdk/pull/10571


More information about the hotspot-compiler-dev mailing list