RFR: 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions

Xiaohong Gong xgong at openjdk.org
Wed Jul 20 07:08:44 UTC 2022


The Vector API unary operation "`REVERSE_BYTES`" should not emit any instructions for byte vectors. The same to the relative masked operation. Currently it emits `"mov dst, src"` on aarch64 when the "`dst`" and "`src`" are not the same register. But for the masked "`REVERSE_BYTES`", the compiler will always generate a "`VectorBlend`" which I think is redundant, since the first and second vector input is the same one. Please see the generated codes for the masked "`REVERSE_BYTES`" for byte type with NEON:

  ldr q16, [x15, #16]            ; load the "src" vector
  mov v17.16b, v16.16b           ; reverse bytes "src"
  ldr q18, [x13, #16]
  neg v18.16b, v18.16b           ; load the vector mask
  bsl v18.16b, v17.16b, v16.16b  ; vector blend

The elements in register "`v17`" and "`v16`" are the same to each other, so the elements in result of "`bsl`" is the same to the original loaded values in "`v16`", no matter what the values in the vector mask are.

To improve this, we can add the igvn transformations for "`ReverseBytesV`" and "`VectorBlend`" in compiler. For "`ReverseBytesV`", it can return the vector input if the basic element type is `T_BYTE`. And for "`VectorBlend`", it can return the first input if the first and the second input are the same one.

Here is the performance data for the jmh benchmark [1] on ARM NEON:

Benchmark                         (size)  Mode  Cnt  Before    After    Units
ByteMaxVector.REVERSE_BYTES        1024  thrpt  15  19457.641 19516.124 ops/ms
ByteMaxVector.REVERSE_BYTESMasked  1024  thrpt  15  12498.416 20528.004 ops/ms

This patch may not have any influence to the non-masked "`REVERSE_BYTES`" on ARM NEON, because the backend may not emit any instruction for it before.

And here is the performance data on an x86 system:

Benchmark                         (size)  Mode  Cnt  Before    After    Units
ByteMaxVector.REVERSE_BYTES        1024  thrpt  15  19358.941 20012.047 ops/ms
ByteMaxVector.REVERSE_BYTESMasked  1024  thrpt  15  15759.788 20389.996 ops/ms

[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L2201

-------------

Commit messages:
 - 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions

Changes: https://git.openjdk.org/jdk/pull/9565/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9565&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8290485
  Stats: 123 lines in 4 files changed: 121 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/9565.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9565/head:pull/9565

PR: https://git.openjdk.org/jdk/pull/9565


More information about the hotspot-compiler-dev mailing list