RFR: 8264409: AArch64: generate better code for Vector API allTrue
Andrew Dinn
adinn at openjdk.java.net
Thu Apr 1 09:05:26 UTC 2021
On Thu, 1 Apr 2021 07:58:07 GMT, Ningsheng Jian <njian at openjdk.org> wrote:
> In Vector API NEON implementation, we use a vector register to represent vector mask, where an element value of -1 is a true mask and an element value of 0 is a false mask. The allTrue() api is used to check whether all the elements of current mask are set.
>
> Currently, the AArch64 NEON allTrue implementation looks like:
>
> andr $tmp, T16B $src1, $src2\t# src2 is maskAllTrue
> notr $tmp, T16B, $tmp
> addv $tmp, T16B, $tmp
> umov $dst, $tmp, B, 0
> cmp $dst, 0
> cset $dst
>
> where $src2 is a preset all true (-1) constant. We can optimize it to the code sequence like below, to check whether all bits are set:
>
> uminv $tmp, T16B, $src1
> umov $dst, $tmp, B, 0
> cmp $dst, 0xff
> cset $dst
>
> With this codegen improvement, we can see about 8%~70% performance uplift on different machines for Alibaba's Vector API bigdata benchmarks [1][2].
>
> Tested with tier1 and vector api jtreg tests.
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/BooleanArrayCheck.java#L61
> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/ValueRangeCheckAndCastL2I.java#L93
It's a clever trick to use uminv for this specific case.
The patch looks good to me.
-------------
Marked as reviewed by adinn (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3302
More information about the hotspot-compiler-dev
mailing list