RFR: 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions
Xiaohong Gong
xgong at openjdk.org
Tue Jul 26 01:11:01 UTC 2022
On Mon, 25 Jul 2022 13:22:11 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> The Vector API unary operation "`REVERSE_BYTES`" should not emit any instructions for byte vectors. The same to the relative masked operation. Currently it emits `"mov dst, src"` on aarch64 when the "`dst`" and "`src`" are not the same register. But for the masked "`REVERSE_BYTES`", the compiler will always generate a "`VectorBlend`" which I think is redundant, since the first and second vector input is the same one. Please see the generated codes for the masked "`REVERSE_BYTES`" for byte type with NEON:
>>
>> ldr q16, [x15, #16] ; load the "src" vector
>> mov v17.16b, v16.16b ; reverse bytes "src"
>> ldr q18, [x13, #16]
>> neg v18.16b, v18.16b ; load the vector mask
>> bsl v18.16b, v17.16b, v16.16b ; vector blend
>>
>> The elements in register "`v17`" and "`v16`" are the same to each other, so the elements in result of "`bsl`" is the same to the original loaded values in "`v16`", no matter what the values in the vector mask are.
>>
>> To improve this, we can add the igvn transformations for "`ReverseBytesV`" and "`VectorBlend`" in compiler. For "`ReverseBytesV`", it can return the vector input if the basic element type is `T_BYTE`. And for "`VectorBlend`", it can return the first input if the first and the second input are the same one.
>>
>> Here is the performance data for the jmh benchmark [1] on ARM NEON:
>>
>> Benchmark (size) Mode Cnt Before After Units
>> ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19457.641 19516.124 ops/ms
>> ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 12498.416 20528.004 ops/ms
>>
>> This patch may not have any influence to the non-masked "`REVERSE_BYTES`" on ARM NEON, because the backend may not emit any instruction for it before.
>>
>> And here is the performance data on an x86 system:
>>
>> Benchmark (size) Mode Cnt Before After Units
>> ByteMaxVector.REVERSE_BYTES 1024 thrpt 15 19358.941 20012.047 ops/ms
>> ByteMaxVector.REVERSE_BYTESMasked 1024 thrpt 15 15759.788 20389.996 ops/ms
>>
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L2201
>
> Looks good to me. I'll run some testing and report back once it passed.
Thanks for the review and testing @TobiHartmann !
-------------
PR: https://git.openjdk.org/jdk/pull/9565
More information about the hotspot-compiler-dev
mailing list