RFR: 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions
Xiaohong Gong
xgong at openjdk.org
Mon Jul 25 02:27:55 UTC 2022
On Wed, 20 Jul 2022 06:58:53 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> The Vector API unary operation "`REVERSE_BYTES`" should not emit any instructions for byte vectors. The same to the relative masked operation. Currently it emits `"mov dst, src"` on aarch64 when the "`dst`" and "`src`" are not the same register. But for the masked "`REVERSE_BYTES`", the compiler will always generate a "`VectorBlend`" which I think is redundant, since the first and second vector input is the same one. Please see the generated codes for the masked "`REVERSE_BYTES`" for byte type with NEON:
>
> ldr q16, [x15, #16] ; load the "src" vector
> mov v17.16b, v16.16b ; reverse bytes "src"
> ldr q18, [x13, #16]
> neg v18.16b, v18.16b ; load the vector mask
> bsl v18.16b, v17.16b, v16.16b ; vector blend
>
> The elements in register "`v17`" and "`v16`" are the same to each other, so the elements in result of "`bsl`" is the same to the original loaded values in "`v16`", no matter what the values in the vector mask are.
>
> To improve this, we can add the igvn transformations for "`ReverseBytesV`" and "`VectorBlend`" in compiler. For "`ReverseBytesV`", it can return the vector input if the basic element type is `T_BYTE`. And for "`VectorBlend`", it can return the first input if the first and the second input are the same one.
>
> Here is the performance data for the jmh benchmark [1] on ARM NEON:
>
> Benchmark (size) Mode Cnt Before After Units
> ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19457.641 19516.124 ops/ms
> ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 12498.416 20528.004 ops/ms
>
> This patch may not have any influence to the non-masked "`REVERSE_BYTES`" on ARM NEON, because the backend may not emit any instruction for it before.
>
> And here is the performance data on an x86 system:
>
> Benchmark (size) Mode Cnt Before After Units
> ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19358.941 20012.047 ops/ms
> ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 15759.788 20389.996 ops/ms
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L2201
Hi, could anyone please help to take a look at this simple patch? Thanks a lot for your time!
-------------
PR: https://git.openjdk.org/jdk/pull/9565
More information about the hotspot-compiler-dev
mailing list