RFR: 8294194: [AArch64] Create intrinsics compress and expand [v8]
Stuart Monteith
smonteith at openjdk.org
Mon Jan 16 14:06:24 UTC 2023
> The java.lang.Long and java.lang.Integer classes have the methods "compress(i, mask)" and "expand(i, mask)". They compile down to 236 assembler instructions. There are no scalar instructions that perform the equivalent functions on aarch64, instead the intrinsics can be implemented with vector instructions included in SVE2; expand with BDEP, compress with BEXT.
>
> Only the first lane of each vector will be used, two MOV instructions will move the inputs from GPRs into temporary vector registers, and another to do the reverse for the result. Autovectorization for this functionality is/will be implemented separately.
>
> Running on an SVE2 enabled system, I ran the following benchmarks:
>
> org.openjdk.bench.java.lang.Integers
> org.openjdk.bench.java.lang.Longs
>
> The time for each operation reduced to 56% to 72% of the original run time:
>
>
> Benchmark Result error Unit % against non-SVE2
> Integers.expand 2.106 0.011 us/op
> Integers.expand-SVE 1.431 0.009 us/op 67.95%
> Longs.expand 2.606 0.006 us/op
> Longs.expand-SVE 1.46 0.003 us/op 56.02%
> Integers.compress 1.982 0.004 us/op
> Integers.compress-SVE 1.427 0.003 us/op 72.00%
> Longs.compress 2.501 0.002 us/op
> Longs.compress-SVE 1.441 0.003 us/op 57.62%
>
>
> These methods can bed specifically tested with:
> `make test TEST="jtreg:compiler/intrinsics/TestBitShuffleOpers.java"`
Stuart Monteith has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
- Merge branch 'openjdk:master' into JDK-8294194
- Merge branch 'openjdk:master' into JDK-8294194
- Merge branch 'openjdk:master' into JDK-8294194
- Merge branch 'openjdk:master' into JDK-8294194
- Merge branch 'openjdk:master' into JDK-8294194
- Merge branch 'openjdk:master' into JDK-8294194
- Update src/hotspot/cpu/aarch64/aarch64.ad
Correct slight formatting error.
Co-authored-by: Eric Liu <eric.c.liu at arm.com>
- 8294194: Create intrinsics compress and expand
The java.lang.Long and java.lang.Integer classes have the methods
"compress(i, mask)" and "expand(i, mask)". They compile down to 236
assembler instructions. There are no scalar instructions that perform
the equivalent functions on aarch64, instead the intrinsics can be
implemented with vector instructions included in SVE2; expand with BDEP,
compress with BEXT.
Only the first lane of each vector will be used, two MOV instructions
will move the inputs from GPRs into temporary vector registers, and
another to do the reverse for the result. Autovectorization for this
functionality is/will be implemented separately.
Running on an SVE2 enabled system, I ran the following benchmarks:
org.openjdk.bench.java.lang.Integers
org.openjdk.bench.java.lang.Longs
The time for each operation reduced to 56% to 72% of the original
run time:
Benchmark Result error Unit % against non-SVE2
Integers.expand 2.106 0.011 us/op
Integers.expand-SVE 1.431 0.009 us/op 67.95%
Longs.expand 2.606 0.006 us/op
Longs.expand-SVE 1.46 0.003 us/op 56.02%
Integers.compress 1.982 0.004 us/op
Integers.compress-SVE 1.427 0.003 us/op 72.00%
Longs.compress 2.501 0.002 us/op
Longs.compress-SVE 1.441 0.003 us/op 57.62%
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/10537/files
- new: https://git.openjdk.org/jdk/pull/10537/files/1b588958..7fb1272f
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=10537&range=07
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=10537&range=06-07
Stats: 1859 lines in 74 files changed: 969 ins; 320 del; 570 mod
Patch: https://git.openjdk.org/jdk/pull/10537.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10537/head:pull/10537
PR: https://git.openjdk.org/jdk/pull/10537
More information about the hotspot-compiler-dev
mailing list