RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
Dean Long
dlong at openjdk.org
Wed Nov 6 17:40:02 UTC 2024
On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:
>>> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable.
>>
>> If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already.
>
> The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like:
>
>
> 0xffffdfdba584: sub sp, sp, #0x10
> 0xffffdfdba588: stp x29, x30, [sp]
> 0xffffdfdba58c: ldrb w8, [x28, #1192]
> 0xffffdfdba590: cbz x8, 0xffffdfdba5a8
> 0xffffdfdba594: mov x8, #0x4ba0
> 0xffffdfdba598: movk x8, #0xf6a8, lsl #16
> 0xffffdfdba59c: movk x8, #0xffff, lsl #32
> 0xffffdfdba5a0: mov x0, x28
> 0xffffdfdba5a4: blr x8
> 0xffffdfdba5a8: mov x9, sp
> 0xffffdfdba5ac: str x9, [x28, #1000] <------- store last_Java_sp
> 0xffffdfdba5b0: mov x0, x1
> 0xffffdfdba5b4: mov x1, x2
> 0xffffdfdba5b8: mov x2, x28
> 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4
> 0xffffdfdba5c0: mov x8, #0xe6a4
> 0xffffdfdba5c4: movk x8, #0xf717, lsl #16
> 0xffffdfdba5c8: movk x8, #0xffff, lsl #32
> 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra words
> 0xffffdfdba5d0: blr x8
> 0xffffdfdba5d4: nop
> 0xffffdfdba5d8: movk xzr, #0x0
> 0xffffdfdba5dc: movk xzr, #0x0
> 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra words
> 0xffffdfdba5e4: str xzr, [x28, #1000]
> 0xffffdfdba5e8: str xzr, [x28, #1008]
> 0xffffdfdba5ec: ldr x10, [x28, #8]
> 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600
> 0xffffdfdba5f4: ldp x29, x30, [sp]
> 0xffffdfdba5f8: add sp, sp, #0x10
> 0xffffdfdba5fc: ret
> 0xffffdfdba600: ldp x29, x30, [sp]
> 0xffffdfdba604: add sp, sp, #0x10
> 0xffffdfdba608: adrp x8, 0xffffdfc30000
> 0xffffdfdba60c: add x8, x8, #0x80
> 0xffffdfdba610: br x8
OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling.
It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666
More information about the core-libs-dev
mailing list