RFR: 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions
Xiaohong Gong
xgong at openjdk.org
Wed Jul 20 07:08:44 UTC 2022
The Vector API unary operation "`REVERSE_BYTES`" should not emit any instructions for byte vectors. The same to the relative masked operation. Currently it emits `"mov dst, src"` on aarch64 when the "`dst`" and "`src`" are not the same register. But for the masked "`REVERSE_BYTES`", the compiler will always generate a "`VectorBlend`" which I think is redundant, since the first and second vector input is the same one. Please see the generated codes for the masked "`REVERSE_BYTES`" for byte type with NEON:
ldr q16, [x15, #16] ; load the "src" vector
mov v17.16b, v16.16b ; reverse bytes "src"
ldr q18, [x13, #16]
neg v18.16b, v18.16b ; load the vector mask
bsl v18.16b, v17.16b, v16.16b ; vector blend
The elements in register "`v17`" and "`v16`" are the same to each other, so the elements in result of "`bsl`" is the same to the original loaded values in "`v16`", no matter what the values in the vector mask are.
To improve this, we can add the igvn transformations for "`ReverseBytesV`" and "`VectorBlend`" in compiler. For "`ReverseBytesV`", it can return the vector input if the basic element type is `T_BYTE`. And for "`VectorBlend`", it can return the first input if the first and the second input are the same one.
Here is the performance data for the jmh benchmark [1] on ARM NEON:
Benchmark (size) Mode Cnt Before After Units
ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19457.641 19516.124 ops/ms
ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 12498.416 20528.004 ops/ms
This patch may not have any influence to the non-masked "`REVERSE_BYTES`" on ARM NEON, because the backend may not emit any instruction for it before.
And here is the performance data on an x86 system:
Benchmark (size) Mode Cnt Before After Units
ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19358.941 20012.047 ops/ms
ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 15759.788 20389.996 ops/ms
[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L2201
-------------
Commit messages:
- 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions
Changes: https://git.openjdk.org/jdk/pull/9565/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9565&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8290485
Stats: 123 lines in 4 files changed: 121 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/9565.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9565/head:pull/9565
PR: https://git.openjdk.org/jdk/pull/9565
More information about the hotspot-compiler-dev
mailing list