RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64 [v2]

Fei Gao fgao at openjdk.org
Thu Nov 23 06:43:33 UTC 2023


> Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp.[[1]](https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079)
> 
> For example, it can merge:
> 
> str     w20, [sp, #16]
> str     w10, [sp, #20]
> 
> into
> 
> stp     w20, w10, [sp, #16]
> 
> 
> But C2 may generate a sequence like:
> 
> str     x21, [sp, #8]
> str     w20, [sp, #16]
> str     x19, [sp, #24] <---
> str     w10, [sp, #20] <--- Before sorting
> str     x11, [sp, #40]
> str     w13, [sp, #48]
> str     x16, [sp, #56]
> 
> We can't do any merging for non-adjacent loads or stores.
> 
> The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence:
> 
> str     x21, [sp, #8]
> str     w20, [sp, #16]
> str     w10, [sp, #20] <---
> str     x19, [sp, #24] <--- After sorting
> str     x11, [sp, #40]
> str     w13, [sp, #48]
> str     x16, [sp, #56]
> 
> 
> Then macro-assembler can do ld/st merging:
> 
> str     x21, [sp, #8]
> stp     w20, w10, [sp, #16] <--- Merged
> str     x19, [sp, #24]
> str     x11, [sp, #40]
> str     w13, [sp, #48]
> str     x16, [sp, #56]
> 
> 
> To justify the patch, we run `HelloWorld.java`
> 
> public class HelloWorld {
>     public static void main(String [] args) {
>         System.out.println("Hello World!");
>     }
> }
> 
> with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
> 
> Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %.
> 
> Tested tier1~3 on x86 and AArch64.

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Fix comments from aph
 - Merge branch 'master' into fg8320379
 - 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
   
   Macro-assembler on aarch64 can merge adjacent loads or stores
   into ldp/stp[1]. For example, it can merge:
   ```
   str     w20, [sp, #16]
   str     w10, [sp, #20]
   ```
   into
   ```
   stp     w20, w10, [sp, #16]
   ```
   
   But C2 may generate a sequence like:
   ```
   str     x21, [sp, #8]
   str     w20, [sp, #16]
   str     x19, [sp, #24] <---
   str     w10, [sp, #20] <--- Before sorting
   str     x11, [sp, #40]
   str     w13, [sp, #48]
   str     x16, [sp, #56]
   ```
   We can't do any merging for non-adjacent loads or stores.
   
   The patch is to sort the spilling or unspilling sequence in
   the order of offset during instruction scheduling and bundling
   phase. After that, we can get a new sequence:
   ```
   str     x21, [sp, #8]
   str     w20, [sp, #16]
   str     w10, [sp, #20] <---
   str     x19, [sp, #24] <--- After sorting
   str     x11, [sp, #40]
   str     w13, [sp, #48]
   str     x16, [sp, #56]
   ```
   
   Then macro-assembler can do ld/st merging:
   ```
   str     x21, [sp, #8]
   stp     w20, w10, [sp, #16] <--- Merged
   str     x19, [sp, #24]
   str     x11, [sp, #40]
   str     w13, [sp, #48]
   str     x16, [sp, #56]
   ```
   
   To justify the patch, we run `HelloWorld.java`
   ```
   public class HelloWorld {
       public static void main(String [] args) {
           System.out.println("Hello World!");
       }
   }
   ```
   with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
   
   Before the patch, macro-assembler can do ld/st merging for
   3688 times. After the patch, the number of ld/st merging
   increases to 3871 times, by ~5 %.
   
   Tested tier1~3 on x86 and AArch64.
   
   [1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/16754/files
  - new: https://git.openjdk.org/jdk/pull/16754/files/96646f70..4637dd8b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=16754&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16754&range=00-01

  Stats: 10064 lines in 364 files changed: 6535 ins; 1227 del; 2302 mod
  Patch: https://git.openjdk.org/jdk/pull/16754.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16754/head:pull/16754

PR: https://git.openjdk.org/jdk/pull/16754


More information about the hotspot-compiler-dev mailing list