RFR: 8326541: [AArch64] ZGC C2 load barrier stub considers the length of live registers when spilling registers [v3]

Stuart Monteith smonteith at openjdk.org
Mon Mar 18 14:40:27 UTC 2024


On Fri, 15 Mar 2024 03:11:05 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:

>> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64.
>> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits,
>> even the use of a floating point may cause the maximum 2048 bits stack occupied.
>> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub.
>> 
>> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 
>> 
>> 
>>   ......
>>   0x0000ffff684cfad8:   stp     x15, x18, [sp, #80]
>>   0x0000ffff684cfadc:   sub     sp, sp, #0x100
>>   0x0000ffff684cfae0:   str     z16, [sp]
>>   0x0000ffff684cfae4:   add     x1, x13, #0x10
>>   0x0000ffff684cfae8:   mov     x0, x16
>>  ;; 0xFFFF803F5414
>>   0x0000ffff684cfaec:   mov     x8, #0x5414                     // #21524
>>   0x0000ffff684cfaf0:   movk    x8, #0x803f, lsl #16
>>   0x0000ffff684cfaf4:   movk    x8, #0xffff, lsl #32
>>   0x0000ffff684cfaf8:   blr     x8
>>   0x0000ffff684cfafc:   mov     x16, x0
>>   0x0000ffff684cfb00:   ldr     z16, [sp]
>>   0x0000ffff684cfb04:   add     sp, sp, #0x100
>>   0x0000ffff684cfb08:   ptrue   p7.b
>>   0x0000ffff684cfb0c:   ldp     x4, x5, [sp, #16]
>>   ......
>> 
>> 
>> could be optimized into:
>> 
>> 
>>   ......  
>>   0x0000ffff684cfa50:   stp     x15, x18, [sp, #80]
>>   0x0000ffff684cfa54:   str     d16, [sp, #-16]!                   // extra 8 bytes to align 16 bytes in push_fp()
>>   0x0000ffff684cfa58:   add     x1, x13, #0x10
>>   0x0000ffff684cfa5c:   mov     x0, x16
>>  ;; 0xFFFF7FA942A8
>>   0x0000ffff684cfa60:   mov     x8, #0x42a8                     // #17064
>>   0x0000ffff684cfa64:   movk    x8, #0x7fa9, lsl #16
>>   0x0000ffff684cfa68:   movk    x8, #0xffff, lsl #32
>>   0x0000ffff684cfa6c:   blr     x8
>>   0x0000ffff684cfa70:   mov     x16, x0
>>   0x0000ffff684cfa74:   ldr     d16, [sp], #16
>>   0x0000ffff684cfa78:   ptrue   p7.b
>>   0x0000ffff684cfa7c:   ldp     x4, x5, [sp, #16]
>>   ......
>> 
>> 
>> Besides the above benefit, when we know what size of register is live,
>> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers.
>> 
>> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced.
>
> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change jtreg test case name

test/hotspot/jtreg/gc/z/TestRegistersPushPopAtZGCLoadBarrierStub.java line 291:

> 289:         String keyString = keyword + expected_number_of_push_pop_at_load_barrier_fregs + " " + expected_freg_type + " registers";
> 290:         if (!containOnlyOneOccuranceOfKeyword(stdout, keyString)) {
> 291:             throw new RuntimeException("Stdout is expected to contain only one occurance of keyString: " + "'" + keyString + "'");

In the event of failure, would it be possible to print the erroneous output? The output from the subprocesses, being directly piped in, doesn't lend itself to easy debugging. At first I thought there might be an option that could alter OutputAnalyzers output, but sadly not.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17977#discussion_r1528691697


More information about the hotspot-dev mailing list