RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction

Guoxiong Li gli at openjdk.org
Wed Feb 28 08:49:53 UTC 2024


On Tue, 27 Feb 2024 21:24:46 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
> 
> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
> 
> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
> 
> [AArch64]
> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
> 
> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
> 
> No effects on other platforms.
> 
> [Performance]
> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
> 
> ADDLanes
> 
> Benchmark                 Before     After      Unit
> FloatMaxVector.ADDLanes   1789.513   5264.226   ops/ms
> 
> 
> Final code is as below:
> 
> Before:
> `        fadda        z17.s, p7/m, z17.s, z16.s
> `
> After:
> 
>         faddp        v17.4s, v21.4s, v21.4s
>         faddp        s18, v17.2s
>         fadd         s18, s18, s19
> 
> 
> 
> 
> [Test]
> Full jtreg passed on AArch64 and x86.
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
> [2] https://bugs.openjdk.org/browse/JDK-8275275
> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316

Before this patch, the `ReductionNode` is strict-ordered and the `UnorderedReductionNode` is unordered. If we want an unordered reduction node, we just extends `UnorderedReductionNode` instead of `ReductionNode`. Now you think every concrete node object should has the right to judge whether it is strict-ordered or not by itself (using the method `requires_strict_order`). If my understanding is right, this patch is a nice refactor. Some concrete comments are shown below.

src/hotspot/cpu/aarch64/aarch64_vector.ad line 2891:

> 2889:   predicate((!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) &&
> 2890:              !n->as_Reduction()->requires_strict_order()) ||
> 2891:             n->as_Reduction()->requires_strict_order());

This predication looks strange and complex. Can it be simplified to `!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) || n->as_Reduction()->requires_strict_order()`?

src/hotspot/cpu/aarch64/aarch64_vector.ad line 2925:

> 2923:   predicate((!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) &&
> 2924:              !n->as_Reduction()->requires_strict_order()) ||
> 2925:             n->as_Reduction()->requires_strict_order());

Same as above. Can it be simplified to `!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) || n->as_Reduction()->requires_strict_order()`?

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 1788:

> 1786:   predicate((!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) &&
> 1787:              !n->as_Reduction()->requires_strict_order()) ||
> 1788:             n->as_Reduction()->requires_strict_order());

Same as above. Can it be simplified to `!VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n->in(2))) || n->as_Reduction()->requires_strict_order()`?

src/hotspot/share/opto/vectorIntrinsics.cpp line 1740:

> 1738:   Node* value = nullptr;
> 1739:   if (mask == nullptr) {
> 1740:     assert(!is_masked_op, "Masked op needs the mask value never null");

This assert may be missed after your refactor. But it seems not really matter.

src/hotspot/share/opto/vectornode.hpp line 242:

> 240:   virtual bool requires_strict_order() const {
> 241:     return false;
> 242:   };

The last semicolon is redundant.

src/hotspot/share/opto/vectornode.hpp line 265:

> 263: class AddReductionVFNode : public ReductionNode {
> 264: private:
> 265:   bool _requires_strict_order; // false in Vector API.

The comment `false in Vector API` seems not so clean. We need to state the meaning of the field instead of one of its usages?

src/hotspot/share/opto/vectornode.hpp line 276:

> 274: 
> 275:   virtual bool cmp(const Node& n) const {
> 276:     return Node::cmp(n) && _requires_strict_order== ((ReductionNode&)n).requires_strict_order();

Need a space before `==`

src/hotspot/share/opto/vectornode.hpp line 297:

> 295: 
> 296:   virtual bool cmp(const Node& n) const {
> 297:     return Node::cmp(n) && _requires_strict_order== ((ReductionNode&)n).requires_strict_order();

Need a space before `==`

-------------

Changes requested by gli (Committer).

PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-1905478301
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505546754
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505548153
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505552017
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505506096
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505466361
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505486346
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505487154
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1505487480


More information about the hotspot-compiler-dev mailing list