RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE
Xiaohong Gong
xgong at openjdk.java.net
Wed Apr 7 06:01:02 UTC 2021
Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
mov z16.b, #-1
eor z17.d, z20.d, z16.d
and z18.d, z18.d, z17.d
This could be improved with a single instruction:
bic z16.d, z16.d, z18.d
Similarly, the following optimization for NEON is also needed:
not v21.16b, v21.16b
and v21.16b, v21.16b, v18.16b ==> bic v21.16b, v18.16b, v21.16b
This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
mov z16.b, #-1
eor z17.d, z20.d, z16.d ==> not z17.d, p7/m, z20.d
The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
Tested tier1 and jdk:tier3.
-------------
Commit messages:
- 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE
Changes: https://git.openjdk.java.net/jdk/pull/3370/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3370&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8264352
Stats: 219 lines in 7 files changed: 185 ins; 0 del; 34 mod
Patch: https://git.openjdk.java.net/jdk/pull/3370.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3370/head:pull/3370
PR: https://git.openjdk.java.net/jdk/pull/3370
More information about the hotspot-compiler-dev
mailing list