Integrated: 8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add and mul reduction
Bhavana Kilambi
bkilambi at openjdk.org
Tue Jun 11 07:20:37 UTC 2024
On Tue, 27 Feb 2024 21:24:46 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
>
> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
>
> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
>
> [AArch64]
> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>
> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
>
> No effects on other platforms.
>
> [Performance]
> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
>
> ADDLanes
>
> Benchmark Before After Unit
> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms
>
>
> Final code is as below:
>
> Before:
> ` fadda z17.s, p7/m, z17.s, z16.s
> `
> After:
>
> faddp v17.4s, v21.4s, v21.4s
> faddp s18, v17.2s
> fadd s18, s18, s19
>
>
>
>
> [Test]
> Full jtreg passed on AArch64 and x86.
>
> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
> [2] https://bugs.openjdk.org/browse/JDK-8275275
> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316
This pull request has now been integrated.
Changeset: 0e4d4a0c
Author: Bhavana Kilambi <bkilambi at openjdk.org>
Committer: Emanuel Peter <epeter at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/0e4d4a0c3150c01d927bd69cc578cea053cf16b3
Stats: 587 lines in 10 files changed: 469 ins; 12 del; 106 mod
8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add and mul reduction
Co-authored-by: Eric Liu <eliu at openjdk.org>
Reviewed-by: gli, epeter, aph
-------------
PR: https://git.openjdk.org/jdk/pull/18034
More information about the hotspot-compiler-dev
mailing list