RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
Pengfei Li
pli at openjdk.java.net
Thu Nov 18 04:03:54 UTC 2021
Arraycopy partial inlining is a C2 compiler technique that avoids stub
call overhead in small-sized arraycopy operations by generating masked
vector instructions. So far it works on x86 AVX512 only and this patch
enables it on AArch64 with SVE.
We add AArch64 matching rule for VectorMaskGenNode and refactor that
node a little bit. The major change is moving the element type field
into its TypeVectMask bottom type. The reason is that AArch64 vector
masks are different for different vector element types.
E.g., an x86 AVX512 vector mask value masking 3 least significant vector
lanes (of any type) is like
`0000 0000 ... 0000 0000 0000 0000 0111`
On AArch64 SVE, this mask value can only be used for masking the 3 least
significant lanes of bytes. But for 3 lanes of ints, the value should be
`0000 0000 ... 0000 0000 0001 0001 0001`
where the least significant bit of each lane matters. So AArch64 matcher
needs to know the vector element type to generate right masks.
After this patch, the C2 generated code for copying a 50-byte array on
AArch64 SVE looks like
mov x12, #0x32
whilelo p0.b, xzr, x12
add x11, x11, #0x10
ld1b {z16.b}, p0/z, [x11]
add x10, x10, #0x10
st1b {z16.b}, p0, [x10]
We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
size arguments on a 512-bit SVE-featured CPU. We got below performance
data changes.
Benchmark (length) (Performance)
ArrayCopyAligned.testByte 10 -2.6%
ArrayCopyAligned.testByte 20 +4.7%
ArrayCopyAligned.testByte 30 +4.8%
ArrayCopyAligned.testByte 40 +21.7%
ArrayCopyAligned.testByte 50 +22.5%
ArrayCopyAligned.testByte 60 +28.4%
The test machine has SVE vector size of 512 bits, so we see performance
gain for most array sizes less than 64 bytes. For very small arrays we
see a bit regression because a vector load/store may be a bit slower
than 1 or 2 scalar loads/stores.
-------------
Commit messages:
- 8277168: AArch64: Enable arraycopy partial inlining with SVE
Changes: https://git.openjdk.java.net/jdk/pull/6444/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6444&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8277168
Stats: 87 lines in 16 files changed: 57 ins; 7 del; 23 mod
Patch: https://git.openjdk.java.net/jdk/pull/6444.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/6444/head:pull/6444
PR: https://git.openjdk.java.net/jdk/pull/6444
More information about the hotspot-dev
mailing list