[aarch64-port-dev ] RFR(S): 8239549: AArch64: Backend support for MulAddVS2VI node

Pengfei Li Pengfei.Li at arm.com
Mon Feb 24 09:43:05 UTC 2020


Hi,

I'd like to have a review of this AArch64 C2 backend support.

JBS: https://bugs.openjdk.java.net/browse/JDK-8239549
Webrev: http://cr.openjdk.java.net/~pli/rfr/8239549/webrev.00/

This adds AArch64 backend support for C2 MulAddVS2VI node. This node,
together with another node MulAddS2I, is added in JDK-8214751 [1] to
vectorize below operation in loops to accelerate some neural network
algorithms.
  out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1]));
where in1 and in2 are arrays of shorts and out is an array of ints.

After this patch, NEON instruction sequence
  smull   v19.4s, v16.4h, v17.4h
  smull2  v16.4s, v16.8h, v17.8h
  addp    v16.4s, v19.4s, v16.4s
  add     v16.4s, v16.4s, v18.4s
will be generated for the vectorized operation. This is equivalent to
the x86 AVX512_VNNI instruction demonstrated on page [2].

The value of VM flag "AlignVector" is changed in this patch for the
following reason. As "vector memory operations could be misaligned when
accesses to arrays of different types are vectorized in one loop" [3],
current C2 superword.cpp doesn't vectorize this kind of loops unless
it's guaranteed that the unaligned loads/stores won't bring performance
penalties. Hence, the x86 backend set "AlignVector" to the opposite of
another x86 flag "UseUnalignedLoadStores", which indicates whether x86
instruction MOVDQU could be used to load/store unaligned memories. In
AArch64, we have a flag "AvoidUnalignedAccesses" indicating if we need
to avoid unaligned loads/stores on current AArch64 micro-architecture.
So we assign "AlignVector" from this.

[Tests]
Jtreg: hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1.
No new failure found.

JMH: Derived a JMH case [4] from the jtreg in the x86 patch [1].
Before
  Benchmark                        Mode  Cnt    Score    Error  Units
  TestSIMDMulAddS2I.testMulAddS2I  avgt   15  260.827 ± 13.864  us/op
After
  Benchmark                        Mode  Cnt   Score   Error  Units
  TestSIMDMulAddS2I.testMulAddS2I  avgt   15  48.297 ± 0.149  us/op

[1] http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7
[2] https://en.wikichip.org/wiki/x86/avx512vnni
[3] https://bugs.openjdk.java.net/browse/JDK-7199010
[4] http://cr.openjdk.java.net/~pli/rfr/8239549/TestSIMDMulAddS2I.java

--
Thanks,
Pengfei



More information about the aarch64-port-dev mailing list