RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64

Fei Gao fgao at openjdk.org
Thu Nov 23 07:01:05 UTC 2023


On Tue, 21 Nov 2023 10:25:04 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp.[[1]](https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079)
>> 
>> For example, it can merge:
>> 
>> str     w20, [sp, #16]
>> str     w10, [sp, #20]
>> 
>> into
>> 
>> stp     w20, w10, [sp, #16]
>> 
>> 
>> But C2 may generate a sequence like:
>> 
>> str     x21, [sp, #8]
>> str     w20, [sp, #16]
>> str     x19, [sp, #24] <---
>> str     w10, [sp, #20] <--- Before sorting
>> str     x11, [sp, #40]
>> str     w13, [sp, #48]
>> str     x16, [sp, #56]
>> 
>> We can't do any merging for non-adjacent loads or stores.
>> 
>> The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence:
>> 
>> str     x21, [sp, #8]
>> str     w20, [sp, #16]
>> str     w10, [sp, #20] <---
>> str     x19, [sp, #24] <--- After sorting
>> str     x11, [sp, #40]
>> str     w13, [sp, #48]
>> str     x16, [sp, #56]
>> 
>> 
>> Then macro-assembler can do ld/st merging:
>> 
>> str     x21, [sp, #8]
>> stp     w20, w10, [sp, #16] <--- Merged
>> str     x19, [sp, #24]
>> str     x11, [sp, #40]
>> str     w13, [sp, #48]
>> str     x16, [sp, #56]
>> 
>> 
>> To justify the patch, we run `HelloWorld.java`
>> 
>> public class HelloWorld {
>>     public static void main(String [] args) {
>>         System.out.println("Hello World!");
>>     }
>> }
>> 
>> with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
>> 
>> Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %.
>> 
>> Tested tier1~3 on x86 and AArch64.
>
> This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.

Hi @theRealAph , thanks a lot for your review! All comments have been resolved in the new commit.

> This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.

Yeah, that's also bothering me. I'm not sure if it benefits other ports. Do you think if we need convert the change to aarch64-only? Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16754#issuecomment-1823892055


More information about the hotspot-compiler-dev mailing list