RFR: 8294865: x86: Improve the code generation of MulVB and MulVL [v2]

Quan Anh Mai qamai at openjdk.org
Thu Oct 6 12:28:28 UTC 2022


> Hi,
> 
> This patch simplifies and improves the code generation of `MulVB` and `MulVL` nodes,
> 
> - MulVB can be implemented by alternating `vmullw` on odd and even-index elements and combining the results.
> - MulVL can be implemented on non-avx512dq by computing the product of each 32-bit half and adding the results together.
> 
> Vector API benchmark shows the results of `MUL` operations:
> 
>                                                   Before                After
>     Benchmark          (size)   Mode  Cnt      Score     Error      Score     Error   Units   Change
>     Byte64Vector.MUL     1024  thrpt   15   8948.607 ± 194.646   8860.404 ± 203.109  ops/ms   -0.99%
>     Byte128Vector.MUL    1024  thrpt   15  12915.839 ± 291.262  13554.662 ± 488.695  ops/ms   +4.95%
>     Byte256Vector.MUL    1024  thrpt   15  12129.959 ± 245.710  23279.276 ± 669.725  ops/ms  +91.92%
>     Long128Vector.MUL    1024  thrpt   15   1183.663 ±  36.440   1489.892 ±  35.356  ops/ms  +25.87%
>     Long256Vector.MUL    1024  thrpt   15   1911.802 ±  95.304   2834.088 ±  77.647  ops/ms  +48.24%
> 
> Please have a look and have some reviews, thank you very much.

Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:

  refactor conditions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/10571/files
  - new: https://git.openjdk.org/jdk/pull/10571/files/6bbbb077..51d39f78

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=10571&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10571&range=00-01

  Stats: 12 lines in 1 file changed: 4 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/10571.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10571/head:pull/10571

PR: https://git.openjdk.org/jdk/pull/10571


More information about the hotspot-compiler-dev mailing list