RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Wed Nov 6 17:40:02 UTC 2024

On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>>> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable.
>> 
>> If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state.  So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor?  Then freeze could assert that the thread is already walkable.  I'm surprised it doesn't already.
>
> The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like:
> 
> 
>    0xffffdfdba584:	sub	sp, sp, #0x10
>    0xffffdfdba588:	stp	x29, x30, [sp]
>    0xffffdfdba58c:	ldrb	w8, [x28, #1192]
>    0xffffdfdba590:	cbz	x8, 0xffffdfdba5a8
>    0xffffdfdba594:	mov	x8, #0x4ba0                	
>    0xffffdfdba598:	movk	x8, #0xf6a8, lsl #16
>    0xffffdfdba59c:	movk	x8, #0xffff, lsl #32
>    0xffffdfdba5a0:	mov	x0, x28
>    0xffffdfdba5a4:	blr	x8
>    0xffffdfdba5a8:	mov	x9, sp
>    0xffffdfdba5ac:	str	x9, [x28, #1000]       <------- store last_Java_sp
>    0xffffdfdba5b0:	mov	x0, x1
>    0xffffdfdba5b4:	mov	x1, x2
>    0xffffdfdba5b8:	mov	x2, x28
>    0xffffdfdba5bc:	adr	x9, 0xffffdfdba5d4
>    0xffffdfdba5c0:	mov	x8, #0xe6a4                	
>    0xffffdfdba5c4:	movk	x8, #0xf717, lsl #16
>    0xffffdfdba5c8:	movk	x8, #0xffff, lsl #32
>    0xffffdfdba5cc:	stp	xzr, x9, [sp, #-16]!   <------- Push two extra words
>    0xffffdfdba5d0:	blr	x8
>    0xffffdfdba5d4:	nop
>    0xffffdfdba5d8:	movk	xzr, #0x0
>    0xffffdfdba5dc:	movk	xzr, #0x0
>    0xffffdfdba5e0:	add	sp, sp, #0x10      <------- Remove two extra words
>    0xffffdfdba5e4:	str	xzr, [x28, #1000]
>    0xffffdfdba5e8:	str	xzr, [x28, #1008]
>    0xffffdfdba5ec:	ldr	x10, [x28, #8]
>    0xffffdfdba5f0:	cbnz	x10, 0xffffdfdba600
>    0xffffdfdba5f4:	ldp	x29, x30, [sp]
>    0xffffdfdba5f8:	add	sp, sp, #0x10
>    0xffffdfdba5fc:	ret
>    0xffffdfdba600:	ldp	x29, x30, [sp]
>    0xffffdfdba604:	add	sp, sp, #0x10
>    0xffffdfdba608:	adrp	x8, 0xffffdfc30000
>    0xffffdfdba60c:	add	x8, x8, #0x80
>    0xffffdfdba610:	br	x8

OK, so you're saying it's the stack adjustment that's the problem.  It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary.  Isn't that a bug that should be fixed?  I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest.  Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling.

It's still unclear to me exactly how we resume after preemption.  It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted.  If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment.  In your example, last_Java_pc would be set to 0xffffdfdba5e4.  I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666