RFR: 8256318: AArch64: Add support for floating-point absolute difference

Ningsheng Jian njian at openjdk.java.net
Mon Nov 16 02:18:54 UTC 2020


On Sat, 14 Nov 2020 06:22:19 GMT, Dong Bo <dongbo at openjdk.org> wrote:

> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector.
> 
> Verified with linux-aarch64-server-release, tier1-3.
> 
> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test.
> 
> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half  (~57%) of the original.
> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB),
> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements.
> 
> The JMH results on Kunpeng916:
> 
> Benchmark                                            (count)  (seed)  Mode  Cnt     Score    Error  Units
> 
> # before, fsub+fabs
> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble     1024  316731  avgt   10  6038.333 ± 3.889  ns/op
> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat      1024  316731  avgt   10  6005.125 ± 3.025  ns/op
> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble     1024  316731  avgt   10   950.340 ± 9.398  ns/op
> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat      1024  316731  avgt   10   454.350 ± 1.798  ns/op
> 
> # after, fabd
> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble     1024  316731  avgt   10  3483.801 ± 1.763  ns/op
> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat      1024  316731  avgt   10  3442.412 ± 1.866  ns/op
> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble     1024  316731  avgt   10   816.301 ± 4.454  ns/op
> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat      1024  316731  avgt   10   354.710 ± 1.001  ns/op

src/hotspot/cpu/aarch64/aarch64.ad line 18110:

> 18108: %{
> 18109:   predicate(n->as_Vector()->length() == 2);
> 18110:   match(Set dst (AbsVF (SubVF src1 src2)));

We now have aarch64_neon.ad, do you think we should put neon vector rules to that file, to keep aarch64.ad smaller?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1215


More information about the hotspot-dev mailing list