RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory

Evgeny Astigeevich github.com+42899633+eastig at openjdk.java.net
Tue Nov 24 10:19:59 UTC 2020


On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4.
>> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them.
>> 
>> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled.
>
> I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others.

> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
> 
> On 11/23/20 9:07 PM, Volker Simonis wrote:
> 
> > Thanks for the detailed performance numbers.
> > Looks good to me.
> 
> The benchmark is missing from the pull request. We can't do anything
> without that.
> 
> --
> Andrew Haley (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang

-------------

PR: https://git.openjdk.java.net/jdk/pull/1293


More information about the hotspot-compiler-dev mailing list