RFR: 8282162: [vector] Optimize integral vector negation API [v3]

Paul Sandoz psandoz at openjdk.java.net
Tue Mar 29 18:08:43 UTC 2022


On Mon, 28 Mar 2022 09:56:22 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> The current vector `"NEG"` is implemented with substraction a vector by zero in case the architecture does not support the negation instruction. And to fit the predicate feature for architectures that support it, the masked vector `"NEG" ` is implemented with pattern `"v.not(m).add(1, m)"`. They both can be optimized to a single negation instruction for ARM SVE.
>> And so does the non-masked "NEG" for NEON. Besides, implementing the masked "NEG" with substraction for architectures that support neither negation instruction nor predicate feature can also save several instructions than the current pattern.
>> 
>> To optimize the VectorAPI negation, this patch moves the implementation from Java side to hotspot. The compiler will generate different nodes according to the architecture:
>>   - Generate the (predicated) negation node if architecture supports it, otherwise, generate "`zero.sub(v)`" pattern for non-masked operation.
>>   - Generate `"zero.sub(v, m)"` for masked operation if the architecture does not have predicate feature, otherwise generate the original pattern `"v.xor(-1, m).add(1, m)"`.
>> 
>> So with this patch, the following transformations are applied:
>> 
>> For non-masked negation with NEON:
>> 
>>   movi    v16.4s, #0x0
>>   sub v17.4s, v16.4s, v17.4s       ==> neg v17.4s, v17.4s
>> 
>> and with SVE:
>> 
>>   mov z16.s, #0
>>   sub z18.s, z16.s, z17.s          ==> neg z16.s, p7/m, z16.s
>> 
>> For masked negation with NEON:
>> 
>>   movi    v17.4s, #0x1
>>   mvn v19.16b, v18.16b
>>   mov v20.16b, v16.16b             ==>  neg v18.4s, v17.4s
>>   bsl v20.16b, v19.16b, v18.16b         bsl v19.16b, v18.16b, v17.16b
>>   add v19.4s, v20.4s, v17.4s
>>   mov v18.16b, v16.16b
>>   bsl v18.16b, v19.16b, v20.16b
>> 
>> and with SVE:
>> 
>>   mov z16.s, #-1
>>   mov z17.s, #1                    ==> neg z16.s, p0/m, z16.s
>>   eor z18.s, p0/m, z18.s, z16.s
>>   add z18.s, p0/m, z18.s, z17.s
>> 
>> Here are the performance gains for benchmarks (see [1][2]) on ARM and x86 machines(note that the non-masked negation benchmarks do not have any improvement on X86 since no instructions are changed):
>> 
>> NEON:
>> Benchmark                Gain
>> Byte128Vector.NEG        1.029
>> Byte128Vector.NEGMasked  1.757
>> Short128Vector.NEG       1.041
>> Short128Vector.NEGMasked 1.659
>> Int128Vector.NEG         1.005
>> Int128Vector.NEGMasked   1.513
>> Long128Vector.NEG        1.003
>> Long128Vector.NEGMasked  1.878
>> 
>> SVE with 512-bits:
>> Benchmark                Gain
>> ByteMaxVector.NEG        1.10
>> ByteMaxVector.NEGMasked  1.165
>> ShortMaxVector.NEG       1.056
>> ShortMaxVector.NEGMasked 1.195
>> IntMaxVector.NEG         1.002
>> IntMaxVector.NEGMasked   1.239
>> LongMaxVector.NEG        1.031
>> LongMaxVector.NEGMasked  1.191
>> 
>> X86 (non AVX-512):
>> Benchmark                Gain
>> ByteMaxVector.NEGMasked  1.254
>> ShortMaxVector.NEGMasked 1.359
>> IntMaxVector.NEGMasked   1.431
>> LongMaxVector.NEGMasked  1.989
>> 
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1881
>> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1896
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make "degenerate_vector_integral_negate" to be "NegVI" private

Java changes are good.

-------------

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7782


More information about the hotspot-compiler-dev mailing list