RFR: 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts
Andrew Haley
aph at openjdk.org
Wed Nov 15 15:26:34 UTC 2023
On Wed, 15 Nov 2023 07:48:28 GMT, Eric Liu <eliu at openjdk.org> wrote:
> Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to `VectorUCastNode` by C2. This patch adds backend implementation for `VectorUCastNode` on AArch64.
>
> The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below:
>
>
>
> Benchmark Before After Units Gain
> VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70
> VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33
> VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85
> VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48
> VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65
> VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58
>
>
> On other Neon systems, we can get similar performance boost as a result of intrinsification success.
>
> Since `VectorUCastNode` only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages.
>
> [TEST]
> compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.
>
> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726
src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 2322:
> 2320: ins_pipe(pipe_slow);
> 2321: %}
> 2322:
The following hunk does not seem to be making good use of the macro processor.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16670#discussion_r1394363082
More information about the core-libs-dev
mailing list