RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5]
Emanuel Peter
epeter at openjdk.org
Mon Apr 15 07:10:48 UTC 2024
On Fri, 12 Apr 2024 12:03:57 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
>>
>> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
>>
>> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
>>
>> [AArch64]
>> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>>
>> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
>>
>> No effects on other platforms.
>>
>> [Performance]
>> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
>>
>> ADDLanes
>>
>> Benchmark Before After Unit
>> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms
>>
>>
>> Final code is as below:
>>
>> Before:
>> ` fadda z17.s, p7/m, z17.s, z16.s
>> `
>> After:
>>
>> faddp v17.4s, v21.4s, v21.4s
>> faddp s18, v17.2s
>> fadd s18, s18, s19
>>
>>
>>
>>
>> [Test]
>> Full jtreg passed on AArch64 and x86.
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
>> [2] https://bugs.openjdk.org/browse/JDK-8275275
>> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert to previous indentation
A few more cosmetic things ;)
src/hotspot/cpu/aarch64/aarch64_vector.ad line 2856:
> 2854: %}
> 2855:
> 2856: // reduction addF
Suggestion:
I think comment could be removed, it seems redundant.
src/hotspot/cpu/aarch64/aarch64_vector.ad line 2861:
> 2859: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered
> 2860: // add reduction).
> 2861: predicate(Matcher::vector_length(n->in(2)) == 2 && !n->as_Reduction()->requires_strict_order());
Would it make sense to change `reduce_add2F_neon` to something like `reduce_non_strict_order_add2F_neon`, just so that it is a bit clearer when one reads the opto-assembly output?
src/hotspot/share/opto/vectorIntrinsics.cpp line 1739:
> 1737: Node* init = ReductionNode::make_identity_con_scalar(gvn(), opc, elem_bt);
> 1738: Node* value = opd;
> 1739:
Suggestion:
assert(mask != nullptr || !is_masked_op, "Masked op needs the mask value never null");
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2000147984
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565263550
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565262784
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565271555
More information about the hotspot-compiler-dev
mailing list