[aarch64-port-dev ] RFR(S): 8239549: AArch64: Backend support for MulAddVS2VI node

Andrew Haley aph at redhat.com
Mon Feb 24 13:57:21 UTC 2020


Hi,

On 2/24/20 9:43 AM, Pengfei Li wrote:

> The value of VM flag "AlignVector" is changed in this patch for the
> following reason. As "vector memory operations could be misaligned when
> accesses to arrays of different types are vectorized in one loop" [3],
> current C2 superword.cpp doesn't vectorize this kind of loops unless
> it's guaranteed that the unaligned loads/stores won't bring performance
> penalties. Hence, the x86 backend set "AlignVector" to the opposite of
> another x86 flag "UseUnalignedLoadStores", which indicates whether x86
> instruction MOVDQU could be used to load/store unaligned memories. In
> AArch64, we have a flag "AvoidUnalignedAccesses" indicating if we need
> to avoid unaligned loads/stores on current AArch64 micro-architecture.
> So we assign "AlignVector" from this.
> 
> [Tests]
> Jtreg: hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1.
> No new failure found.
> 
> JMH: Derived a JMH case [4] from the jtreg in the x86 patch [1].
> Before
>   Benchmark                        Mode  Cnt    Score    Error  Units
>   TestSIMDMulAddS2I.testMulAddS2I  avgt   15  260.827 ± 13.864  us/op
> After
>   Benchmark                        Mode  Cnt   Score   Error  Units
>   TestSIMDMulAddS2I.testMulAddS2I  avgt   15  48.297 ± 0.149  us/op
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7
> [2] https://en.wikichip.org/wiki/x86/avx512vnni
> [3] https://bugs.openjdk.java.net/browse/JDK-7199010
> [4] http://cr.openjdk.java.net/~pli/rfr/8239549/TestSIMDMulAddS2I.java

Seems to work, although I'm not seeing the dramatic speedup that you are.

Before:
Benchmark                        Mode  Cnt    Score   Error  Units
TestSIMDMulAddS2I.testMulAddS2I  avgt    8  164.148 ± 0.490  us/op

After:
Benchmark                        Mode  Cnt   Score   Error  Units
TestSIMDMulAddS2I.testMulAddS2I  avgt    8  71.981 ± 0.280  us/op

I guess my processor is both slower and faster than yours.  :-)

Approved.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



More information about the aarch64-port-dev mailing list