RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64 [v2]

Fei Gao fgao at openjdk.org
Tue Nov 28 02:16:08 UTC 2023


On Mon, 27 Nov 2023 22:08:58 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Fix comments from aph
>>  - Merge branch 'master' into fg8320379
>>  - 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
>>    
>>    Macro-assembler on aarch64 can merge adjacent loads or stores
>>    into ldp/stp[1]. For example, it can merge:
>>    ```
>>    str     w20, [sp, #16]
>>    str     w10, [sp, #20]
>>    ```
>>    into
>>    ```
>>    stp     w20, w10, [sp, #16]
>>    ```
>>    
>>    But C2 may generate a sequence like:
>>    ```
>>    str     x21, [sp, #8]
>>    str     w20, [sp, #16]
>>    str     x19, [sp, #24] <---
>>    str     w10, [sp, #20] <--- Before sorting
>>    str     x11, [sp, #40]
>>    str     w13, [sp, #48]
>>    str     x16, [sp, #56]
>>    ```
>>    We can't do any merging for non-adjacent loads or stores.
>>    
>>    The patch is to sort the spilling or unspilling sequence in
>>    the order of offset during instruction scheduling and bundling
>>    phase. After that, we can get a new sequence:
>>    ```
>>    str     x21, [sp, #8]
>>    str     w20, [sp, #16]
>>    str     w10, [sp, #20] <---
>>    str     x19, [sp, #24] <--- After sorting
>>    str     x11, [sp, #40]
>>    str     w13, [sp, #48]
>>    str     x16, [sp, #56]
>>    ```
>>    
>>    Then macro-assembler can do ld/st merging:
>>    ```
>>    str     x21, [sp, #8]
>>    stp     w20, w10, [sp, #16] <--- Merged
>>    str     x19, [sp, #24]
>>    str     x11, [sp, #40]
>>    str     w13, [sp, #48]
>>    str     x16, [sp, #56]
>>    ```
>>    
>>    To justify the patch, we run `HelloWorld.java`
>>    ```
>>    public class HelloWorld {
>>        public static void main(String [] args) {
>>            System.out.println("Hello World!");
>>        }
>>    }
>>    ```
>>    with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
>>    
>>    Before the patch, macro-assembler can do ld/st merging for
>>    3688 times. After the patch, the number of ld/st merging
>>    increases to 3871 times, by ~5 %.
>>    
>>    Tested tier1~3 on x86 and AArch64.
>>    
>>    [1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079
>
> My tier1-4,xcomp,stress testing passed.

Thanks for all your reviewing and test work! @vnkozlov @theRealAph 

I'll integrate it if there is no other comment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16754#issuecomment-1828950038


More information about the hotspot-compiler-dev mailing list