RFR: 8326541: [AArch64] ZGC C2 load barrier stub considers the length of live registers when spilling registers [v4]
Joshua Zhu
jzhu at openjdk.org
Wed Mar 20 03:55:33 UTC 2024
> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64.
> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits,
> even the use of a floating point may cause the maximum 2048 bits stack occupied.
> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub.
>
> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2
>
>
> ......
> 0x0000ffff684cfad8: stp x15, x18, [sp, #80]
> 0x0000ffff684cfadc: sub sp, sp, #0x100
> 0x0000ffff684cfae0: str z16, [sp]
> 0x0000ffff684cfae4: add x1, x13, #0x10
> 0x0000ffff684cfae8: mov x0, x16
> ;; 0xFFFF803F5414
> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524
> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16
> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32
> 0x0000ffff684cfaf8: blr x8
> 0x0000ffff684cfafc: mov x16, x0
> 0x0000ffff684cfb00: ldr z16, [sp]
> 0x0000ffff684cfb04: add sp, sp, #0x100
> 0x0000ffff684cfb08: ptrue p7.b
> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16]
> ......
>
>
> could be optimized into:
>
>
> ......
> 0x0000ffff684cfa50: stp x15, x18, [sp, #80]
> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp()
> 0x0000ffff684cfa58: add x1, x13, #0x10
> 0x0000ffff684cfa5c: mov x0, x16
> ;; 0xFFFF7FA942A8
> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064
> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16
> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32
> 0x0000ffff684cfa6c: blr x8
> 0x0000ffff684cfa70: mov x16, x0
> 0x0000ffff684cfa74: ldr d16, [sp], #16
> 0x0000ffff684cfa78: ptrue p7.b
> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16]
> ......
>
>
> Besides the above benefit, when we know what size of register is live,
> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers.
>
> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced.
Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision:
Add more output for easy debugging once the jtreg test case fails
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/17977/files
- new: https://git.openjdk.org/jdk/pull/17977/files/382866f7..f2960eb1
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=17977&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=17977&range=02-03
Stats: 21 lines in 1 file changed: 17 ins; 0 del; 4 mod
Patch: https://git.openjdk.org/jdk/pull/17977.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17977/head:pull/17977
PR: https://git.openjdk.org/jdk/pull/17977
More information about the hotspot-dev
mailing list