RFR: 8282162: [vector] Optimize vector negation API
Xiaohong Gong
xgong at openjdk.java.net
Fri Mar 11 06:37:00 UTC 2022
The current vector `"NEG"` is implemented with substraction a vector by zero in case the architecture does not support the negation instruction. And to fit the predicate feature for architectures that support it, the masked vector `"NEG" ` is implemented with pattern `"v.not(m).add(1, m)"`. They both can be optimized to a single negation instruction for ARM SVE.
And so does the non-masked "NEG" for NEON. Besides, implementing the masked "NEG" with substraction for architectures that support neither negation instruction nor predicate feature can also save several instructions than the current pattern.
To optimize the VectorAPI negation, this patch moves the implementation from Java side to hotspot. The compiler will generate different nodes according to the architecture:
- Generate the (predicated) negation node if architecture supports it, otherwise, generate "`zero.sub(v)`" pattern for non-masked operation.
- Generate `"zero.sub(v, m)"` for masked operation if the architecture does not have predicate feature, otherwise generate the original pattern `"v.xor(-1, m).add(1, m)"`.
So with this patch, the following transformations are applied:
For non-masked negation with NEON:
movi v16.4s, #0x0
sub v17.4s, v16.4s, v17.4s ==> neg v17.4s, v17.4s
and with SVE:
mov z16.s, #0
sub z18.s, z16.s, z17.s ==> neg z16.s, p7/m, z16.s
For masked negation with NEON:
movi v17.4s, #0x1
mvn v19.16b, v18.16b
mov v20.16b, v16.16b ==> neg v18.4s, v17.4s
bsl v20.16b, v19.16b, v18.16b bsl v19.16b, v18.16b, v17.16b
add v19.4s, v20.4s, v17.4s
mov v18.16b, v16.16b
bsl v18.16b, v19.16b, v20.16b
and with SVE:
mov z16.s, #-1
mov z17.s, #1 ==> neg z16.s, p0/m, z16.s
eor z18.s, p0/m, z18.s, z16.s
add z18.s, p0/m, z18.s, z17.s
Here are the performance gains for benchmarks (see [1][2]) on ARM and x86 machines(note that the non-masked negation benchmarks do not have any improvement on X86 since no instructions are changed):
NEON:
Benchmark Gain
Byte128Vector.NEG 1.029
Byte128Vector.NEGMasked 1.757
Short128Vector.NEG 1.041
Short128Vector.NEGMasked 1.659
Int128Vector.NEG 1.005
Int128Vector.NEGMasked 1.513
Long128Vector.NEG 1.003
Long128Vector.NEGMasked 1.878
SVE with 512-bits:
Benchmark Gain
ByteMaxVector.NEG 1.10
ByteMaxVector.NEGMasked 1.165
ShortMaxVector.NEG 1.056
ShortMaxVector.NEGMasked 1.195
IntMaxVector.NEG 1.002
IntMaxVector.NEGMasked 1.239
LongMaxVector.NEG 1.031
LongMaxVector.NEGMasked 1.191
X86 (non AVX-512):
Benchmark Gain
ByteMaxVector.NEGMasked 1.254
ShortMaxVector.NEGMasked 1.359
IntMaxVector.NEGMasked 1.431
LongMaxVector.NEGMasked 1.989
[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1881
[2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1896
-------------
Commit messages:
- 8282162: [vector] Optimize vector negation API
Changes: https://git.openjdk.java.net/jdk/pull/7782/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7782&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8282162
Stats: 308 lines in 15 files changed: 267 ins; 25 del; 16 mod
Patch: https://git.openjdk.java.net/jdk/pull/7782.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7782/head:pull/7782
PR: https://git.openjdk.java.net/jdk/pull/7782
More information about the hotspot-dev
mailing list