[aarch64-port-dev ] RFR(S): 8239549: AArch64: Backend support for MulAddVS2VI node
Pengfei Li
Pengfei.Li at arm.com
Mon Feb 24 09:43:05 UTC 2020
Hi,
I'd like to have a review of this AArch64 C2 backend support.
JBS: https://bugs.openjdk.java.net/browse/JDK-8239549
Webrev: http://cr.openjdk.java.net/~pli/rfr/8239549/webrev.00/
This adds AArch64 backend support for C2 MulAddVS2VI node. This node,
together with another node MulAddS2I, is added in JDK-8214751 [1] to
vectorize below operation in loops to accelerate some neural network
algorithms.
out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1]));
where in1 and in2 are arrays of shorts and out is an array of ints.
After this patch, NEON instruction sequence
smull v19.4s, v16.4h, v17.4h
smull2 v16.4s, v16.8h, v17.8h
addp v16.4s, v19.4s, v16.4s
add v16.4s, v16.4s, v18.4s
will be generated for the vectorized operation. This is equivalent to
the x86 AVX512_VNNI instruction demonstrated on page [2].
The value of VM flag "AlignVector" is changed in this patch for the
following reason. As "vector memory operations could be misaligned when
accesses to arrays of different types are vectorized in one loop" [3],
current C2 superword.cpp doesn't vectorize this kind of loops unless
it's guaranteed that the unaligned loads/stores won't bring performance
penalties. Hence, the x86 backend set "AlignVector" to the opposite of
another x86 flag "UseUnalignedLoadStores", which indicates whether x86
instruction MOVDQU could be used to load/store unaligned memories. In
AArch64, we have a flag "AvoidUnalignedAccesses" indicating if we need
to avoid unaligned loads/stores on current AArch64 micro-architecture.
So we assign "AlignVector" from this.
[Tests]
Jtreg: hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1.
No new failure found.
JMH: Derived a JMH case [4] from the jtreg in the x86 patch [1].
Before
Benchmark Mode Cnt Score Error Units
TestSIMDMulAddS2I.testMulAddS2I avgt 15 260.827 ± 13.864 us/op
After
Benchmark Mode Cnt Score Error Units
TestSIMDMulAddS2I.testMulAddS2I avgt 15 48.297 ± 0.149 us/op
[1] http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7
[2] https://en.wikichip.org/wiki/x86/avx512vnni
[3] https://bugs.openjdk.java.net/browse/JDK-7199010
[4] http://cr.openjdk.java.net/~pli/rfr/8239549/TestSIMDMulAddS2I.java
--
Thanks,
Pengfei
More information about the aarch64-port-dev
mailing list