RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8
Chang Peng
duke at openjdk.org
Thu Jun 8 02:52:19 UTC 2023
This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers.
VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness.
This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4).
Test:
All vector and vectorapi test passed.
Performance:
The benchmark function is like:
@Benchmark
public static int testInt() {
int res = 0;
for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) {
VectorMask<Integer> m = VectorMask.fromArray(INT_SPECIES, ia, i);
res += m.firstTrue();
}
return res;
}
Following data is collected on a 128-bit Neon machine.
Benchmark Before After Unit
testInt 22214.740 25627.833 ops/ms
testLong 11649.898 13698.535 ops/ms
[1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue()
[2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540
[3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select-
Change-Id: I4a2de805ffa4469f88d510c96617eae165f0e025
-------------
Commit messages:
- 8309583: AArch64: Optimize firstTrue() when amount of elements < 8
Changes: https://git.openjdk.org/jdk/pull/14373/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8309583
Stats: 84 lines in 2 files changed: 14 ins; 58 del; 12 mod
Patch: https://git.openjdk.org/jdk/pull/14373.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373
PR: https://git.openjdk.org/jdk/pull/14373
More information about the hotspot-compiler-dev
mailing list