RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v2]

Chang Peng duke at openjdk.org
Mon Jun 19 01:45:59 UTC 2023


> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers.
> 
> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness.
> 
> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4).
> 
> Test:
> All vector and vectorapi test passed.
> 
> Performance:
> The benchmark function is like:
> 
> 
> @Benchmark
> public static int testInt() {
>     int res = 0;
>     for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) {
>         VectorMask<Integer> m = VectorMask.fromArray(INT_SPECIES, ia, i);
>         res += m.firstTrue();
>     }
> 
>     return res;
> }
> 
> 
> Following data is collected on a 128-bit Neon machine.
> 
> Benchmark     Before     After     Unit
> testInt       22214.740  25627.833 ops/ms
> testLong      11649.898  13698.535 ops/ms
> 
> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue()
> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540
> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select-

Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'openjdk:master' into optimize_firsttrue2e4e_neon
 - 8309583: AArch64: Optimize firstTrue() when amount of elements < 8
   
   This patch optimizes VectorMask.firstTrue() on Neon when there are 2
   or 4 elements in vector registers.
   
   VectorMask.firstTrue() should return VLEGNTH when vector mask is all
   false [1]. Current implementation uses rbit and then clz [2] to count
   leading zeros, then uses csel [3] (conditional select) to get the
   smaller value between VLENGTH and the number of unset lanes to ensure
   correctness.
   
   This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4
   elements in boolean masks, before rbit and clz. With this trick, maximum
   value calculated in such case will be VLENGTH (2 or 4).
   
   Test:
   All vector and vectorapi test passed.
   
   Performance:
   The benchmark function is like:
   
   ```
   @Benchmark
   public static int testInt() {
       int res = 0;
       for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) {
           VectorMask<Integer> m = VectorMask.fromArray(INT_SPECIES, ia, i);
           res += m.firstTrue();
       }
   
       return res;
   }
   ```
   
   Following data is collected on a 128-bit Neon machine.
   
   Benchmark     Before     After     Unit
   testInt       22214.740  25627.833 ops/ms
   testLong      11649.898  13698.535 ops/ms
   
   [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue()
   [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540
   [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select-
   
   Change-Id: I4a2de805ffa4469f88d510c96617eae165f0e025

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/14373/files
  - new: https://git.openjdk.org/jdk/pull/14373/files/24b6d738..d8507105

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=00-01

  Stats: 82117 lines in 1520 files changed: 59805 ins; 16698 del; 5614 mod
  Patch: https://git.openjdk.org/jdk/pull/14373.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373

PR: https://git.openjdk.org/jdk/pull/14373


More information about the hotspot-compiler-dev mailing list