RFR: 8290485: [vector] REVERSE_BYTES for byte type should not emit any instructions

Xiaohong Gong xgong at openjdk.org
Tue Jul 26 02:59:02 UTC 2022


On Tue, 26 Jul 2022 02:36:31 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> The Vector API unary operation "`REVERSE_BYTES`" should not emit any instructions for byte vectors. The same to the relative masked operation. Currently it emits `"mov dst, src"` on aarch64 when the "`dst`" and "`src`" are not the same register. But for the masked "`REVERSE_BYTES`", the compiler will always generate a "`VectorBlend`" which I think is redundant, since the first and second vector input is the same one. Please see the generated codes for the masked "`REVERSE_BYTES`" for byte type with NEON:
>> 
>>   ldr q16, [x15, #16]            ; load the "src" vector
>>   mov v17.16b, v16.16b           ; reverse bytes "src"
>>   ldr q18, [x13, #16]
>>   neg v18.16b, v18.16b           ; load the vector mask
>>   bsl v18.16b, v17.16b, v16.16b  ; vector blend
>> 
>> The elements in register "`v17`" and "`v16`" are the same to each other, so the elements in result of "`bsl`" is the same to the original loaded values in "`v16`", no matter what the values in the vector mask are.
>> 
>> To improve this, we can add the igvn transformations for "`ReverseBytesV`" and "`VectorBlend`" in compiler. For "`ReverseBytesV`", it can return the vector input if the basic element type is `T_BYTE`. And for "`VectorBlend`", it can return the first input if the first and the second input are the same one.
>> 
>> Here is the performance data for the jmh benchmark [1] on ARM NEON:
>> 
>> Benchmark                         (size)  Mode  Cnt  Before    After    Units
>> ByteMaxVector.REVERSE_BYTES        1024  thrpt  15  19457.641 19516.124 ops/ms
>> ByteMaxVector.REVERSE_BYTESMasked  1024  thrpt  15  12498.416 20528.004 ops/ms
>> 
>> This patch may not have any influence to the non-masked "`REVERSE_BYTES`" on ARM NEON, because the backend may not emit any instruction for it before.
>> 
>> And here is the performance data on an x86 system:
>> 
>> Benchmark                         (size)  Mode  Cnt  Before    After    Units
>> ByteMaxVector.REVERSE_BYTES        1024  thrpt  15  19358.941 20012.047 ops/ms
>> ByteMaxVector.REVERSE_BYTESMasked  1024  thrpt  15  15759.788 20389.996 ops/ms
>> 
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L2201
>
> Good.

Thanks for the review @vnkozlov !

-------------

PR: https://git.openjdk.org/jdk/pull/9565


More information about the hotspot-compiler-dev mailing list