RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3]
Emanuel Peter
epeter at openjdk.org
Mon Mar 18 12:59:32 UTC 2024
On Fri, 15 Mar 2024 11:03:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
>>
>> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
>>
>> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
>>
>> [AArch64]
>> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>>
>> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
>>
>> No effects on other platforms.
>>
>> [Performance]
>> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
>>
>> ADDLanes
>>
>> Benchmark Before After Unit
>> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms
>>
>>
>> Final code is as below:
>>
>> Before:
>> ` fadda z17.s, p7/m, z17.s, z16.s
>> `
>> After:
>>
>> faddp v17.4s, v21.4s, v21.4s
>> faddp s18, v17.2s
>> fadd s18, s18, s19
>>
>>
>>
>>
>> [Test]
>> Full jtreg passed on AArch64 and x86.
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
>> [2] https://bugs.openjdk.org/browse/JDK-8275275
>> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>
> Naming changes: replace strict/non-strict with more technical terms
Looks better already!
Left a few comments, questions and suggestions.
src/hotspot/cpu/aarch64/aarch64_vector.ad line 2887:
> 2885: // Such nodes can only be generated by Vector API.
> 2886: // 2. Non-associative (or strictly ordered) AddReductionVF, which can only be generated by
> 2887: // auto-vectorization on SVE machine.
I'd be careful with such strong statements. You can name them as examples, but the "only" may not stay true forever. We may for example at some point add some way to have a `Float.addAssociative`, to allow "fast-math" reductions in plain java, which can then be auto-vectorized with a non-strict ordered implementation. I'm not sure if we will ever do that, but it is possible.
src/hotspot/share/opto/loopopts.cpp line 4306:
> 4304: }
> 4305:
> 4306: static bool is_unordered_reduction(Node* n) {
I think you should rename all the "unordered" to "associative".
I implemented this, and I think it would now be nicer to make that change.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2:
> 1: /*
> 2: * Copyright (c) 2020, 2024, Oracle and/or its affiliates. All rights reserved.
Sometimes you update ARM copyright, sometimes the Oracle one. Is that intended?
src/hotspot/share/opto/vectornode.cpp line 1332:
> 1330: case Op_AddReductionVL: return new AddReductionVLNode(ctrl, n1, n2);
> 1331: case Op_AddReductionVF: return new AddReductionVFNode(ctrl, n1, n2, is_associative);
> 1332: case Op_AddReductionVD: return new AddReductionVDNode(ctrl, n1, n2, is_associative);
Why do you only do it for the `F/D` `Add` instructions, but not the `Mul` instructions? Would those not equally profit from associativity?
src/hotspot/share/opto/vectornode.hpp line 270:
> 268: // when it is auto-vectorized as auto-vectorization mandates the operation to be
> 269: // non-associative (strictly ordered).
> 270: bool _is_associative;
Could this be a `const`?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-1942840120
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528457622
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528461083
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528464022
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528468239
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528482036
More information about the hotspot-compiler-dev
mailing list