RFR: 8264409: AArch64: generate better code for Vector API allTrue
Ningsheng Jian
njian at openjdk.java.net
Thu Apr 1 08:05:44 UTC 2021
In Vector API NEON implementation, we use a vector register to represent vector mask, where an element value of -1 is a true mask and an element value of 0 is a false mask. The allTrue() api is used to check whether all the elements of current mask are set.
Currently, the AArch64 NEON allTrue implementation looks like:
andr $tmp, T16B $src1, $src2\t# src2 is maskAllTrue
notr $tmp, T16B, $tmp
addv $tmp, T16B, $tmp
umov $dst, $tmp, B, 0
cmp $dst, 0
cset $dst
where $src2 is a preset all true (-1) constant. We can optimize it to the code sequence like below, to check whether all bits are set:
uminv $tmp, T16B, $src1
umov $dst, $tmp, B, 0
cmp $dst, 0xff
cset $dst
With this codegen improvement, we can see about 8%~70% performance uplift on different machines for Alibaba's Vector API bigdata benchmarks [1][2].
Tested with tier1 and vector api jtreg tests.
[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/BooleanArrayCheck.java#L61
[2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/ValueRangeCheckAndCastL2I.java#L93
-------------
Commit messages:
- 8264409: AArch64: generate better code for Vector API allTrue
Changes: https://git.openjdk.java.net/jdk/pull/3302/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3302&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8264409
Stats: 409 lines in 5 files changed: 13 ins; 12 del; 384 mod
Patch: https://git.openjdk.java.net/jdk/pull/3302.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3302/head:pull/3302
PR: https://git.openjdk.java.net/jdk/pull/3302
More information about the hotspot-compiler-dev
mailing list