RFR: JDK-8274795: AArch64: avoid spilling and restoring r18 in macro assembler [v2]
Andrew Haley
aph at openjdk.java.net
Wed Oct 6 09:29:08 UTC 2021
On Wed, 6 Oct 2021 09:03:29 GMT, Bernhard Urban-Forster <burban at openjdk.org> wrote:
>> r18 should not be used as it is reserved as platform register. Linux is fine with userspace using it, but Windows and also recently macOS (https://github.com/openjdk/jdk11u-dev/pull/301#issuecomment-911998917 ) are actually using it on the kernel side.
>>
>> The macro assembler uses the bit pattern `0x7fff_ffff` (== `r0-r30`) to specify which registers to spill; fortunately this helper is only used here:
>>
>> https://github.com/openjdk/jdk/blob/c05dc268acaf87236f30cf700ea3ac778e3b20e5/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#L1400-L1404
>>
>> I haven't seen causing this particular instance any issues in practice _yet_, presumably because it looks hard to align the stars in order to trigger a problem (between `stp` and `ldp` of `r18` a transition to kernel space must happen *and* the kernel needs to do something with `r18`). But jdk11u-dev has more usages of the `::pusha`/`::popa` macro and that causes troubles as explained in the link above.
>>
>> Output of `-XX:+PrintInterpreter` before this change:
>>
>> ----------------------------------------------------------------------
>> method entry point (kind = native) [0x0000000138809b00, 0x000000013880a280] 1920 bytes
>> --------------------------------------------------------------------------------
>> 0x0000000138809b00: ldr x2, [x12, #16]
>> 0x0000000138809b04: ldrh w2, [x2, #44]
>> 0x0000000138809b08: add x24, x20, x2, uxtx #3
>> 0x0000000138809b0c: sub x24, x24, #0x8
>> [...]
>> 0x0000000138809fa4: stp x16, x17, [sp, #128]
>> 0x0000000138809fa8: stp x18, x19, [sp, #144]
>> 0x0000000138809fac: stp x20, x21, [sp, #160]
>> [...]
>> 0x0000000138809fc0: stp x30, xzr, [sp, #240]
>> 0x0000000138809fc4: mov x0, x28
>> ;; 0x10864ACCC
>> 0x0000000138809fc8: mov x9, #0xaccc // #44236
>> 0x0000000138809fcc: movk x9, #0x864, lsl #16
>> 0x0000000138809fd0: movk x9, #0x1, lsl #32
>> 0x0000000138809fd4: blr x9
>> 0x0000000138809fd8: ldp x2, x3, [sp, #16]
>> [...]
>> 0x0000000138809ff4: ldp x16, x17, [sp, #128]
>> 0x0000000138809ff8: ldp x18, x19, [sp, #144]
>> 0x0000000138809ffc: ldp x20, x21, [sp, #160]
>>
>>
>> After:
>>
>> ----------------------------------------------------------------------
>> method entry point (kind = native) [0x0000000108e4db00, 0x0000000108e4e280] 1920 bytes
>>
>> --------------------------------------------------------------------------------
>> 0x0000000108e4db00: ldr x2, [x12, #16]
>> 0x0000000108e4db04: ldrh w2, [x2, #44]
>> 0x0000000108e4db08: add x24, x20, x2, uxtx #3
>> 0x0000000108e4db0c: sub x24, x24, #0x8
>> [...]
>> 0x0000000108e4dfa4: stp x16, x17, [sp, #128]
>> 0x0000000108e4dfa8: stp x19, x20, [sp, #144]
>> 0x0000000108e4dfac: stp x21, x22, [sp, #160]
>> [...]
>> 0x0000000108e4dfbc: stp x29, x30, [sp, #224]
>> 0x0000000108e4dfc0: mov x0, x28
>> ;; 0x107E4A06C
>> 0x0000000108e4dfc4: mov x9, #0xa06c // #41068
>> 0x0000000108e4dfc8: movk x9, #0x7e4, lsl #16
>> 0x0000000108e4dfcc: movk x9, #0x1, lsl #32
>> 0x0000000108e4dfd0: blr x9
>> 0x0000000108e4dfd4: ldp x2, x3, [sp, #16]
>> [...]
>> 0x0000000108e4dff0: ldp x16, x17, [sp, #128]
>> 0x0000000108e4dff4: ldp x19, x20, [sp, #144]
>> 0x0000000108e4dff8: ldp x21, x22, [sp, #160]
>> [...]
>
> Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision:
>
> remove pusha/popa, be explicit in the template generator
>
> Restore looks like this now:
> ```
> 0x0000000106e4dfcc: movk x9, #0x5e4, lsl #16
> 0x0000000106e4dfd0: movk x9, #0x1, lsl #32
> 0x0000000106e4dfd4: blr x9
> 0x0000000106e4dfd8: ldp x2, x3, [sp, #16]
> 0x0000000106e4dfdc: ldp x4, x5, [sp, #32]
> 0x0000000106e4dfe0: ldp x6, x7, [sp, #48]
> 0x0000000106e4dfe4: ldp x8, x9, [sp, #64]
> 0x0000000106e4dfe8: ldp x10, x11, [sp, #80]
> 0x0000000106e4dfec: ldp x12, x13, [sp, #96]
> 0x0000000106e4dff0: ldp x14, x15, [sp, #112]
> 0x0000000106e4dff4: ldp x16, x17, [sp, #128]
> 0x0000000106e4dff8: ldp x0, x1, [sp], #144
> 0x0000000106e4dffc: ldp xzr, x19, [sp], #16
> 0x0000000106e4e000: ldp x22, x23, [sp, #16]
> 0x0000000106e4e004: ldp x24, x25, [sp, #32]
> 0x0000000106e4e008: ldp x26, x27, [sp, #48]
> 0x0000000106e4e00c: ldp x28, x29, [sp, #64]
> 0x0000000106e4e010: ldp x30, xzr, [sp, #80]
> 0x0000000106e4e014: ldp x20, x21, [sp], #96
> 0x0000000106e4e018: ldur x12, [x29, #-24]
> 0x0000000106e4e01c: ldr x22, [x12, #16]
> 0x0000000106e4e020: add x22, x22, #0x30
> 0x0000000106e4e024: ldr x8, [x28, #8]
> ```
Sorry, stupid mistake: the pop is done in reverse, of course!
Please try this (still untested) code.
#ifdef(R18_RESERVED)
pop(RegSet::range(r20, r30));
ldp(zr, r19, Address(__ post(sp, 2 * wordSize)));
#else
pop(RegSet::range(r18, r30));
#endif
pop(RegSet::range(r0, r17));
-------------
PR: https://git.openjdk.java.net/jdk/pull/5828
More information about the hotspot-dev
mailing list