RFR: 8256318: AArch64: Add support for floating-point absolute difference
Dong Bo
dongbo at openjdk.java.net
Mon Nov 16 02:53:55 UTC 2020
On Mon, 16 Nov 2020 02:15:13 GMT, Ningsheng Jian <njian at openjdk.org> wrote:
>> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector.
>>
>> Verified with linux-aarch64-server-release, tier1-3.
>>
>> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test.
>>
>> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original.
>> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB),
>> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements.
>>
>> The JMH results on Kunpeng916:
>>
>> Benchmark (count) (seed) Mode Cnt Score Error Units
>>
>> # before, fsub+fabs
>> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ± 3.889 ns/op
>> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ± 3.025 ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ± 9.398 ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ± 1.798 ns/op
>>
>> # after, fabd
>> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ± 1.763 ns/op
>> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ± 1.866 ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ± 4.454 ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ± 1.001 ns/op
>
> src/hotspot/cpu/aarch64/aarch64.ad line 18110:
>
>> 18108: %{
>> 18109: predicate(n->as_Vector()->length() == 2);
>> 18110: match(Set dst (AbsVF (SubVF src1 src2)));
>
> We now have aarch64_neon.ad, do you think we should put neon vector rules to that file, to keep aarch64.ad smaller?
I've considered this.
But I feel a little bit unconsistent that only the new `fabd` is added into `aarch64_neon.ad`,
while other NEON intructions (i.e. `fabs`, `fsub`, `fdiv`, `fsqrt`, etc) are still in aarch64.ad.
And moving them all from aarch64.ad to aarch64_neon.ad deviates far from this patch.
I think I can put the close related `fabs` and `fabd` into aarch_neon.ad in this patch.
Is that OK?
-------------
PR: https://git.openjdk.java.net/jdk/pull/1215
More information about the hotspot-dev
mailing list