RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction

Bhavana Kilambi bkilambi at openjdk.org
Tue Feb 27 21:43:04 UTC 2024


Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.

To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.

With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.

[AArch64]
On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].

This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.

No effects on other platforms.

[Performance]
FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).

ADDLanes

Benchmark                 Before     After      Unit
FloatMaxVector.ADDLanes   1789.513   5264.226   ops/ms


Final code is as below:

Before:
`        fadda        z17.s, p7/m, z17.s, z16.s
`
After:

        faddp        v17.4s, v21.4s, v21.4s
        faddp        s18, v17.2s
        fadd         s18, s18, s19




[Test]
Full jtreg passed on AArch64 and x86.

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
[2] https://bugs.openjdk.org/browse/JDK-8275275
[3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316

-------------

Commit messages:
 - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction

Changes: https://git.openjdk.org/jdk/pull/18034/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8320725
  Stats: 183 lines in 7 files changed: 85 ins; 24 del; 74 mod
  Patch: https://git.openjdk.org/jdk/pull/18034.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034

PR: https://git.openjdk.org/jdk/pull/18034


More information about the hotspot-compiler-dev mailing list