RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]
Dong Bo
dongbo at openjdk.java.net
Mon Jan 11 05:16:53 UTC 2021
On Sat, 9 Jan 2021 08:13:45 GMT, Dong Bo <dongbo at openjdk.org> wrote:
>> I don't think the real problem is only the tests, it's that common cases don't get vectorized.
>> Can we fix this code so that it works with ```Math.abs()``` ?
>> Are there any examples of plausible Java code that benefit from this optimization?
>
> According to the results of `JMH perfasm`, `Math.max(max, Math.abs(floatsA[i] - floatsB[i]))` is vectorized when `COUNT=8` on a X86 platform.
> While on aarch64, `floatsB[i] = Math.abs(floatsA[i])` is not vectorized when `COUNT = 10` and we can not match `VAbs2F` neither.
> I am going to investigate the failed vectorization and see if we can have `Max2F` matched. Thanks.
Hi,
I made a mistake to say that the code is not vectorized with `COUNT < 12`, seems that the percentages of vectorized code is too small to be catched by `JMH perfasm`.
To observed if `Min/MaxReductionVNode` are created or not, I added a explicit print in `ReductionNode::make`, like:
--- a/src/hotspot/share/opto/vectornode.cpp
+++ b/src/hotspot/share/opto/vectornode.cpp
@@ -961,7 +961,9 @@ ReductionNode* ReductionNode::make(int opc, Node *ctrl, Node* n1, Node* n2, Basi
case Op_MinReductionV: return new MinReductionVNode(ctrl, n1, n2);
- case Op_MaxReductionV: return new MaxReductionVNode(ctrl, n1, n2);
+ case Op_MaxReductionV:
+ warning("in ReductionNode::make, making a MaxReductionVNode, length %d", n2->bottom_type()->is_vect()->length());
+ return new MaxReductionVNode(ctrl, n1, n2);
case Op_AndReductionV: return new AndReductionVNode(ctrl, n1, n2);
In my observation, we have `Max4F` when `COUNT >= 4`, it is resonable to create `Max4F` other than `Max2F`.
The `Max2F` is created with `COUNT == 3` and `-XX:-SuperWordLoopUnrollAnalysis`.
But I did not find any noticeable improvements with such a small percentage.
The JMH has been updated, the performance results are:
Benchmark (COUNT_DOUBLE) (COUNT_FLOAT) (seed) Mode Cnt Score Error Units
# Kunpeng 916, default
VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 677.778 ± 0.694 ns/op
VectorReductionFloatingMinMax.maxRedF 512 3 0 avgt 10 21.016 ± 0.097 ns/op
VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 677.633 ± 0.664 ns/op
VectorReductionFloatingMinMax.minRedF 512 3 0 avgt 10 21.001 ± 0.019 ns/op
# Kunpeng 916, fmaxp/fminp
VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 425.776 ± 0.785 ns/op
VectorReductionFloatingMinMax.maxRedF 512 3 0 avgt 10 20.883 ± 0.033 ns/op
VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 426.177 ± 3.258 ns/op
VectorReductionFloatingMinMax.minRedF 512 3 0 avgt 10 20.871 ± 0.044 ns/op
-------------
PR: https://git.openjdk.java.net/jdk/pull/1925
More information about the hotspot-compiler-dev
mailing list