RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Fei Gao
fgao at openjdk.org
Thu Nov 23 07:01:05 UTC 2023
On Tue, 21 Nov 2023 10:25:04 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp.[[1]](https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079)
>>
>> For example, it can merge:
>>
>> str w20, [sp, #16]
>> str w10, [sp, #20]
>>
>> into
>>
>> stp w20, w10, [sp, #16]
>>
>>
>> But C2 may generate a sequence like:
>>
>> str x21, [sp, #8]
>> str w20, [sp, #16]
>> str x19, [sp, #24] <---
>> str w10, [sp, #20] <--- Before sorting
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>> We can't do any merging for non-adjacent loads or stores.
>>
>> The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence:
>>
>> str x21, [sp, #8]
>> str w20, [sp, #16]
>> str w10, [sp, #20] <---
>> str x19, [sp, #24] <--- After sorting
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>>
>> Then macro-assembler can do ld/st merging:
>>
>> str x21, [sp, #8]
>> stp w20, w10, [sp, #16] <--- Merged
>> str x19, [sp, #24]
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>>
>> To justify the patch, we run `HelloWorld.java`
>>
>> public class HelloWorld {
>> public static void main(String [] args) {
>> System.out.println("Hello World!");
>> }
>> }
>>
>> with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
>>
>> Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %.
>>
>> Tested tier1~3 on x86 and AArch64.
>
> This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.
Hi @theRealAph , thanks a lot for your review! All comments have been resolved in the new commit.
> This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.
Yeah, that's also bothering me. I'm not sure if it benefits other ports. Do you think if we need convert the change to aarch64-only? Thanks.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16754#issuecomment-1823892055
More information about the hotspot-compiler-dev
mailing list