RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory
Jie He
github.com+10233373+jhe33 at openjdk.java.net
Fri Nov 27 06:26:56 UTC 2020
On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4.
>> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them.
>>
>> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled.
>
> I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others.
Hi @theRealAph,
I also have a patch to fix the unaligned copy small memory (< 16 bytes) when copy a big chunk of memory (> 96 bytes) in this function copy_memory_small(), but it couldn't impact the performance too much, I'm not sure if it is worth pushing to upstream. please refer to [1].
1. [JBS-8149448](https://bugs.openjdk.java.net/browse/JDK-8149448)
-------------
PR: https://git.openjdk.java.net/jdk/pull/1293
More information about the hotspot-compiler-dev
mailing list