RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8]

Wed May 8 11:25:04 UTC 2024

On Wed, 8 May 2024 11:20:50 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>> 
>>  - Merge master
>>  - Adjust format for the backend rules changed in previous commit
>>  - Address some more review comments
>>  - Revert to previous indentation
>>  - Add comments, revert to requires_strict_order and other minor changes
>>  - Naming changes: replace strict/non-strict with more technical terms
>>  - Addressed review comments for changes in backend rules and code style
>>  - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction
>>    
>>    Floating-point addition is non-associative, that is adding
>>    floating-point elements in arbitrary order may get different value.
>>    Specially, Vector API does not define the order of reduction
>>    intentionally, which allows platforms to generate more efficient codes
>>    [1]. So that needs a node to represent non strictly-ordered
>>    add-reduction for floating-point type in C2.
>>    
>>    To avoid introducing new nodes, this patch adds a bool field in
>>    `AddReductionVF/D` to distinguish whether they require strict order. It
>>    also removes `UnorderedReductionNode` and adds a virtual function
>>    `bool requires_strict_order()` in `ReductionNode`. Besides
>>    `AddReductionVF/D`, other reduction nodes' `requires_strict_order()`
>>    have a fixed value.
>>    
>>    With this patch, Vector API would always generate non strictly-ordered
>>    `AddReductionVF/D' on SVE machines with vector length <= 16B as it is
>>    more beneficial to generate non-strictly ordered instructions on such
>>    machines compared to strictly ordered ones.
>>    
>>    [AArch64]
>>    On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated.
>>    Auto-vectorization has already banned these nodes in JDK-8275275 [2].
>>    
>>    This patch adds matching rules for non strictly-ordered
>>    `AddReductionVF/D`.
>>    
>>    No effects on other platforms.
>>    
>>    [Performance]
>>    FloatMaxVector.ADDLanes [3] measures the performance of add reduction
>>    for floating-point type. With this patch, it improves ~3x on my SVE
>>    machine (128-bit).
>>    
>>    ADDLanes
>>    Benchmark                 Before     After      Unit
>>    FloatMaxVector.ADDLanes   1789.513   5264.226   ops/ms
>>    
>>    Final code is as below:
>>    
>>    ```
>>    Before:...
>
> src/hotspot/cpu/aarch64/aarch64_vector.ad line 2865:
> 
>> 2863:   // Non-strictly ordered floating-point add reduction for vector length of 64-bit. As an
>> 2864:   // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered
>> 2865:   // add reduction).
> 
> Suggestion:
> 
>   // Non-strictly ordered floating-point add reduction for a 64-bits-long vector. This rule
>   // is intended for the VectorAPI (which allows for non-strictly ordered add reduction).

Please repeat this change everywhere.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1593867651