RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Andrew Haley
aph at openjdk.org
Thu Nov 23 15:37:12 UTC 2023
On Tue, 21 Nov 2023 10:25:04 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp.[[1]](https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079)
>>
>> For example, it can merge:
>>
>> str w20, [sp, #16]
>> str w10, [sp, #20]
>>
>> into
>>
>> stp w20, w10, [sp, #16]
>>
>>
>> But C2 may generate a sequence like:
>>
>> str x21, [sp, #8]
>> str w20, [sp, #16]
>> str x19, [sp, #24] <---
>> str w10, [sp, #20] <--- Before sorting
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>> We can't do any merging for non-adjacent loads or stores.
>>
>> The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence:
>>
>> str x21, [sp, #8]
>> str w20, [sp, #16]
>> str w10, [sp, #20] <---
>> str x19, [sp, #24] <--- After sorting
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>>
>> Then macro-assembler can do ld/st merging:
>>
>> str x21, [sp, #8]
>> stp w20, w10, [sp, #16] <--- Merged
>> str x19, [sp, #24]
>> str x11, [sp, #40]
>> str w13, [sp, #48]
>> str x16, [sp, #56]
>>
>>
>> To justify the patch, we run `HelloWorld.java`
>>
>> public class HelloWorld {
>> public static void main(String [] args) {
>> System.out.println("Hello World!");
>> }
>> }
>>
>> with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
>>
>> Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %.
>>
>> Tested tier1~3 on x86 and AArch64.
>
> This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.
> Hi @theRealAph , thanks a lot for your review! All comments have been resolved in the new commit.
>
> > This is a good idea, although the real-world gains are small. I'd wonder if this was worth doing for non-AArch64 ports, although even on others sorting the accesses into order might help.
>
> Yeah, that's also bothering me. I'm not sure if it benefits other ports. Do you think if we need convert the change to aarch64-only? Thanks.
#ifdefs are probably wrose, so I'd leave it as it is. We need another reviewer.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16754#issuecomment-1824624806
More information about the hotspot-compiler-dev
mailing list