RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory
Eugene Astigeevich
github.com+42899633+eastig at openjdk.java.net
Mon Nov 23 21:07:04 UTC 2020
This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4.
This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them.
The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled.
-------------
Commit messages:
- 8256488: Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory
Changes: https://git.openjdk.java.net/jdk/pull/1293/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1293&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8256488
Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
Patch: https://git.openjdk.java.net/jdk/pull/1293.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/1293/head:pull/1293
PR: https://git.openjdk.java.net/jdk/pull/1293
More information about the hotspot-compiler-dev
mailing list