RFR: 8294865: x86: Improve the code generation of MulVB and MulVL [v3]
Quan Anh Mai
qamai at openjdk.org
Thu Oct 13 16:53:45 UTC 2022
> Hi,
>
> This patch simplifies and improves the code generation of `MulVB` and `MulVL` nodes,
>
> - MulVB can be implemented by alternating `vmullw` on odd and even-index elements and combining the results.
> - MulVL can be implemented on non-avx512dq by computing the product of each 32-bit half and adding the results together.
>
> Vector API benchmark shows the results of `MUL` operations:
>
> Before After
> Benchmark (size) Mode Cnt Score Error Score Error Units Change
> Byte64Vector.MUL 1024 thrpt 15 8948.607 ± 194.646 8860.404 ± 203.109 ops/ms -0.99%
> Byte128Vector.MUL 1024 thrpt 15 12915.839 ± 291.262 13554.662 ± 488.695 ops/ms +4.95%
> Byte256Vector.MUL 1024 thrpt 15 12129.959 ± 245.710 23279.276 ± 669.725 ops/ms +91.92%
> Long128Vector.MUL 1024 thrpt 15 1183.663 ± 36.440 1489.892 ± 35.356 ops/ms +25.87%
> Long256Vector.MUL 1024 thrpt 15 1911.802 ± 95.304 2834.088 ± 77.647 ops/ms +48.24%
>
> Please have a look and have some reviews, thank you very much.
Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
- Merge branch 'master' into improveMulVB
- refactor conditions
- add vmulB for 8 bytes
- Merge branch 'master' into improveMulVB
- Merge branch 'master' into improveMulVB
- Merge branch 'master' into improveMulVB
- fix
- mulV
-------------
Changes: https://git.openjdk.org/jdk/pull/10571/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10571&range=02
Stats: 170 lines in 5 files changed: 15 ins; 64 del; 91 mod
Patch: https://git.openjdk.org/jdk/pull/10571.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10571/head:pull/10571
PR: https://git.openjdk.org/jdk/pull/10571
More information about the hotspot-compiler-dev
mailing list