RFR: 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts [v3]
Andrew Haley
aph at openjdk.org
Tue Nov 21 14:52:09 UTC 2023
On Tue, 21 Nov 2023 13:24:34 GMT, Eric Liu <eliu at openjdk.org> wrote:
>> Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to `VectorUCastNode` by C2. This patch adds backend implementation for `VectorUCastNode` on AArch64.
>>
>> The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below:
>>
>>
>>
>> Benchmark Before After Units Gain
>> VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70
>> VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33
>> VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85
>> VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48
>> VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65
>> VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58
>>
>>
>> On other Neon systems, we can get similar performance boost as a result of intrinsification success.
>>
>> Since `VectorUCastNode` only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages.
>>
>> [TEST]
>> compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726
>
> Eric Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> add _sve_xunpk & remove dead code
>
> Change-Id: Ic19836feb8a73ea7e65443794f2a0eb1363f6e2f
src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3872:
> 3870: void sve_sunpklo(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) {
> 3871: _sve_xunpk(/* is_unsigned */ false, /* is_high */ false, Zd, T, Zn);
> 3872: }
This code expansion does not look right. You should be able to make this change without so much code expansion.
#define INSN(NAME, unsigned, high)
void name(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) { \
_sve_xunpk(unsigned, high, T, Zn); \
}
INSN(sve_uunpkhi, true, true) ...
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16670#discussion_r1400716627
More information about the hotspot-compiler-dev
mailing list