RFR: 8294194: [AArch64] Create intrinsics compress and expand [v3]

Stuart Monteith smonteith at openjdk.org
Thu Dec 1 09:11:17 UTC 2022


> The java.lang.Long and java.lang.Integer classes have the methods "compress(i, mask)" and "expand(i, mask)". They compile down to 236 assembler instructions. There are no scalar instructions that perform the equivalent functions on aarch64, instead the intrinsics can be implemented with vector instructions included in SVE2; expand with BDEP, compress with BEXT.
> 
> Only the first lane of each vector will be used, two MOV instructions will move the inputs from GPRs into temporary vector registers, and another to do the reverse for the result. Autovectorization for this functionality is/will be implemented separately.
> 
> Running on an SVE2 enabled system, I ran the following benchmarks:
> 
>         org.openjdk.bench.java.lang.Integers
>         org.openjdk.bench.java.lang.Longs
> 
> The time for each operation reduced to 56% to 72% of the original run time:
> 
> 
> Benchmark               Result  error   Unit    % against non-SVE2
> Integers.expand         2.106   0.011   us/op
> Integers.expand-SVE     1.431   0.009   us/op   67.95%
> Longs.expand            2.606   0.006   us/op
> Longs.expand-SVE        1.46    0.003   us/op   56.02%
> Integers.compress       1.982   0.004   us/op
> Integers.compress-SVE   1.427   0.003   us/op   72.00%
> Longs.compress          2.501   0.002   us/op
> Longs.compress-SVE      1.441   0.003   us/op   57.62%
> 
> 
> These methods can bed  specifically tested with:
> `make test TEST="jtreg:compiler/intrinsics/TestBitShuffleOpers.java"`

Stuart Monteith has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'openjdk:master' into JDK-8294194
 - Update src/hotspot/cpu/aarch64/aarch64.ad
   
   Correct slight formatting error.
   
   Co-authored-by: Eric Liu <eric.c.liu at arm.com>
 - 8294194: Create intrinsics compress and expand
   
   The java.lang.Long and java.lang.Integer classes have the methods
   "compress(i, mask)" and "expand(i, mask)". They compile down to 236
   assembler instructions. There are no scalar instructions that perform
   the equivalent functions on aarch64, instead the intrinsics can be
   implemented with vector instructions included in SVE2; expand with BDEP,
   compress with BEXT.
   
   Only the first lane of each vector will be used, two MOV instructions
   will move the inputs from GPRs into temporary vector registers, and
   another to do the reverse for the result. Autovectorization for this
   functionality is/will be implemented separately.
   
   Running on an SVE2 enabled system, I ran the following benchmarks:
   
           org.openjdk.bench.java.lang.Integers
           org.openjdk.bench.java.lang.Longs
   
   The time for each operation reduced to 56% to 72% of the original
   run time:
   
   Benchmark               Result  error   Unit    % against non-SVE2
   Integers.expand         2.106   0.011   us/op
   Integers.expand-SVE     1.431   0.009   us/op   67.95%
   Longs.expand            2.606   0.006   us/op
   Longs.expand-SVE        1.46    0.003   us/op   56.02%
   Integers.compress       1.982   0.004   us/op
   Integers.compress-SVE   1.427   0.003   us/op   72.00%
   Longs.compress          2.501   0.002   us/op
   Longs.compress-SVE      1.441   0.003   us/op   57.62%

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/10537/files
  - new: https://git.openjdk.org/jdk/pull/10537/files/8b13dabb..a7484586

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=10537&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10537&range=01-02

  Stats: 319140 lines in 4215 files changed: 161741 ins; 101533 del; 55866 mod
  Patch: https://git.openjdk.org/jdk/pull/10537.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10537/head:pull/10537

PR: https://git.openjdk.org/jdk/pull/10537


More information about the hotspot-compiler-dev mailing list