RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3]
Andrew Haley
aph at openjdk.org
Mon Mar 18 17:32:35 UTC 2024
On Fri, 15 Mar 2024 11:03:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
>>
>> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
>>
>> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
>>
>> [AArch64]
>> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>>
>> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
>>
>> No effects on other platforms.
>>
>> [Performance]
>> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
>>
>> ADDLanes
>>
>> Benchmark Before After Unit
>> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms
>>
>>
>> Final code is as below:
>>
>> Before:
>> ` fadda z17.s, p7/m, z17.s, z16.s
>> `
>> After:
>>
>> faddp v17.4s, v21.4s, v21.4s
>> faddp s18, v17.2s
>> fadd s18, s18, s19
>>
>>
>>
>>
>> [Test]
>> Full jtreg passed on AArch64 and x86.
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
>> [2] https://bugs.openjdk.org/browse/JDK-8275275
>> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>
> Naming changes: replace strict/non-strict with more technical terms
src/hotspot/share/opto/vectornode.hpp line 240:
> 238: //
> 239: // Other reductions are associative (do not need strict ordering).
> 240: virtual bool is_associative() const {
I think this flag may be badly named. The idea you want to express is not so much associativity, but whether such nodes should be treated as strictly ordered. It would be much less confusing to pick a name like ordered() because that describes what you want to the node to do.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1528979851
More information about the hotspot-compiler-dev
mailing list