RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Andrew Haley
aph at openjdk.org
Tue Nov 21 10:10:00 UTC 2023
On Tue, 21 Nov 2023 07:15:15 GMT, Fei Gao <fgao at openjdk.org> wrote:
> Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp.[[1]](https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079)
>
> For example, it can merge:
>
> str w20, [sp, #16]
> str w10, [sp, #20]
>
> into
>
> stp w20, w10, [sp, #16]
>
>
> But C2 may generate a sequence like:
>
> str x21, [sp, #8]
> str w20, [sp, #16]
> str x19, [sp, #24] <---
> str w10, [sp, #20] <--- Before sorting
> str x11, [sp, #40]
> str w13, [sp, #48]
> str x16, [sp, #56]
>
> We can't do any merging for non-adjacent loads or stores.
>
> The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence:
>
> str x21, [sp, #8]
> str w20, [sp, #16]
> str w10, [sp, #20] <---
> str x19, [sp, #24] <--- After sorting
> str x11, [sp, #40]
> str w13, [sp, #48]
> str x16, [sp, #56]
>
>
> Then macro-assembler can do ld/st merging:
>
> str x21, [sp, #8]
> stp w20, w10, [sp, #16] <--- Merged
> str x19, [sp, #24]
> str x11, [sp, #40]
> str w13, [sp, #48]
> str x16, [sp, #56]
>
>
> To justify the patch, we run `HelloWorld.java`
>
> public class HelloWorld {
> public static void main(String [] args) {
> System.out.println("Hello World!");
> }
> }
>
> with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
>
> Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %.
>
> Tested tier1~3 on x86 and AArch64.
src/hotspot/share/opto/output.cpp line 2280:
> 2278: }
> 2279:
> 2280: bool Scheduling::compare_two_spill_nodes(Node* first, Node* second) {
Suggestion:
int Scheduling::compare_two_spill_nodes(Node* first, Node* second) {
src/hotspot/share/opto/output.cpp line 2297:
> 2295: if (OptoReg::is_stack(first_dst_lo) && OptoReg::is_stack(second_dst_lo) &&
> 2296: OptoReg::is_reg(first_src_lo) && OptoReg::is_reg(second_src_lo)) {
> 2297: return _regalloc->reg2offset(first_dst_lo) > _regalloc->reg2offset(second_dst_lo);
Suggestion:
return _regalloc->reg2offset(first_dst_lo) - _regalloc->reg2offset(second_dst_lo);
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16754#discussion_r1400328363
PR Review Comment: https://git.openjdk.org/jdk/pull/16754#discussion_r1400327834
More information about the hotspot-compiler-dev
mailing list