RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v10]

Fri May 24 08:53:11 UTC 2024

On Tue, 21 May 2024 13:11:22 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2.
>> 
>> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value.
>> 
>> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones.
>> 
>> [AArch64]
>> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>> 
>> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`.
>> 
>> No effects on other platforms.
>> 
>> [Performance]
>> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit).
>> 
>> ADDLanes
>> 
>> Benchmark                 Before     After      Unit
>> FloatMaxVector.ADDLanes   1789.513   5264.226   ops/ms
>> 
>> 
>> Final code is as below:
>> 
>> Before:
>> `        fadda        z17.s, p7/m, z17.s, z16.s
>> `
>> After:
>> 
>>         faddp        v17.4s, v21.4s, v21.4s
>>         faddp        s18, v17.2s
>>         fadd         s18, s18, s19
>> 
>> 
>> 
>> 
>> [Test]
>> Full jtreg passed on AArch64 and x86.
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529
>> [2] https://bugs.openjdk.org/browse/JDK-8275275
>> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Modify JTREG IR rules and some style/format changes

Thanks for adding the tests! 😊  I have a few more comments/requests.

test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 24:

> 22:  */
> 23: 
> 24: package compiler.c2.irTests;

Would you mind moving the tests away from `c2/irTests`, and into the `loopopts/superword` and `vectorapi` directories, please?  `c2/irTests` is kind of a disorganized "catch-all". We only had this directory at the beginning of the IR-framework before it was widely adapted. Now it makes more sense to sort the tests by "what" they test, rather than "how" ;)

test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 32:

> 30:  * @bug 8320725
> 31:  * @summary Ensure strictly ordered AddReductionVF/VD nodes are generated on SVE machines
> 32:  * while being disabled on Neon

Suggestion:

 *          while being disabled on Neon

test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 55:

> 53:     @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VF})
> 54:     @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VF, ">=1"},
> 55:         failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL)

Can you please re-format the rules? We usually have this order, each on a new line:

counts
failOn
applyIf...
phase

Having a consistent format just makes it easier to read quickly ;)

test/hotspot/jtreg/compiler/c2/irTests/TestVectorFPReduction.java line 67:

> 65:     @IR(applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}, failOn = {IRNode.ADD_REDUCTION_VD})
> 66:     @IR(applyIfCPUFeature = {"sve", "true"}, counts = {"requires_strict_order", ">=1", IRNode.ADD_REDUCTION_VD, ">=1"},
> 67:         failOn = {"no_strict_order"}, phase = CompilePhase.PRINT_IDEAL)

Also: I realize that you only check for `asimd / sve` features. Can you also apply it for avx features?

test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 42:

> 40:  * @bug 8320725
> 41:  * @library /test/lib /
> 42:  * @requires os.arch == "aarch64"

I think there is no reason to only run the test on aarch64. We can run the test anywhere, but the applyIf specifies on what platforms the IR rules are executed.

test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 44:

> 42:  * @requires os.arch == "aarch64"
> 43:  * @summary Verify non-strictly ordered AddReductionVF/VD and MulReductionVF/VD
> 44:  * nodes are generated for float and double types in VectorAPI

indentation

test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java line 181:

> 179: 
> 180:     public static void main(String[] args) {
> 181:         TestFramework.runWithFlags("-XX:-TieredCompilation",

Why `-XX:-TieredCompilation`?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2076223588
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613095032
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613084279
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613097871
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613103237
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613104822
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613106458
PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1613107304