RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Wed Nov 6 17:40:10 UTC 2024

On Fri, 25 Oct 2024 18:46:52 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> src/hotspot/share/runtime/objectMonitor.cpp line 2028:
>> 
>>> 2026:   // First time we run after being preempted on Object.wait().
>>> 2027:   // Check if we were interrupted or the wait timed-out, and in
>>> 2028:   // that case remove ourselves from the _WaitSet queue.
>> 
>> I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to.
>
> This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer.

The first sentence is not a sentence. Is it supposed to be saying:

// The first time we run after being preempted on Object.wait()
// we check if we were interrupted or the wait timed-out ...

?

>> src/hotspot/share/runtime/objectMonitor.hpp line 292:
>> 
>>> 290: 
>>> 291:   static int64_t owner_for(JavaThread* thread);
>>> 292:   static int64_t owner_for_oop(oop vthread);
>> 
>> Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names?
>
> I changed them to `owner_from`. I added a comment referring to the return value as tid, and then I used this tid name in some other comments. Maybe this methods should be called `tid_from()`? Alternatively we could use the term owner id instead, and these would be `owner_id_from()`. In theory, this tid term or owner id (or whatever other name) does not need to be related to `j.l.Thread.tid`, it just happens that that's what we are using as the actual value for this id.

I like the idea of using `owner_id_from` but it then suggests to me that `JavaThread::_lock_id` should be something like `JavaThread::_monitor_owner_id`. 

The use of `tid` in comments can be confusing when applied to a `JavaThread` as the "tid" there would normally be a reference of its `osThread()->thread_id()" not it's `threadObj()->thread_id()`. I don't have an obviously better suggestion though.

>> src/hotspot/share/runtime/objectMonitor.hpp line 302:
>> 
>>> 300:   // Simply set _owner field to new_value; current value must match old_value.
>>> 301:   void      set_owner_from_raw(int64_t old_value, int64_t new_value);
>>> 302:   void      set_owner_from(int64_t old_value, JavaThread* current);
>> 
>> Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment.
>
> I added a comment. How about s/old_value/old_tid and s/new_value/new_tid?

old_tid/new_tid works for me.

>> src/hotspot/share/runtime/objectMonitor.hpp line 302:
>> 
>>> 300:   void      set_owner_from(int64_t old_value, JavaThread* current);
>>> 301:   // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER.
>>> 302:   void      set_owner_from_BasicLock(JavaThread* current);
>> 
>> Shouldn't tid there be the basicLock?
>
> So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead.

Right I understand we can't store the BasicLock* directly in owner, but the naming of this method has me confused as to what it actually does. With the old version we have:

Before: owner = BasicLock* belonging to current
After: owner = JavaThread* of current

with the new version we have:

Before: owner = ANONYMOUS_OWNER
After: owner = tid of current

so "BasicLock" doesn't mean anything here any more. Isn't this just `set_owner_from_anonymous` ?

>> src/hotspot/share/runtime/objectMonitor.hpp line 349:
>> 
>>> 347:   ObjectWaiter* first_waiter()                                         { return _WaitSet; }
>>> 348:   ObjectWaiter* next_waiter(ObjectWaiter* o)                           { return o->_next; }
>>> 349:   JavaThread* thread_of_waiter(ObjectWaiter* o)                        { return o->_thread; }
>> 
>> This no longer looks correct if the waiter is a vthread. ??
>
> It is, we still increment _waiters for the vthread case.

Sorry the target of my comment was not clear.  `thread_of_waiter` looks suspicious - will JVMTI find the vthread from the JavaThread?

>> src/hotspot/share/runtime/synchronizer.cpp line 670:
>> 
>>> 668:   // Top native frames in the stack will not be seen if we attempt
>>> 669:   // preemption, since we start walking from the last Java anchor.
>>> 670:   NoPreemptMark npm(current);
>> 
>> Don't we still pin for JNI monitor usage?
>
> Only when facing contention on this call. But once we have the monitor we don't.

But if this is from JNI then we have at least one native frame on the stack making the JNI call, so we have to be pinned if we were to block on the monitor. ???

>> src/java.base/share/classes/java/lang/VirtualThread.java line 111:
>> 
>>> 109:      *  BLOCKING -> BLOCKED        // blocked on monitor enter
>>> 110:      *   BLOCKED -> UNBLOCKED      // unblocked, may be scheduled to continue
>>> 111:      * UNBLOCKED -> RUNNING        // continue execution after blocked on monitor enter
>> 
>> Presumably this one means it acquired the monitor?
>
> Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor.

So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have:

RUNNING -> BLOCKING -> BLOCKED
BLOCKED -> UNBLOCKING -> RUNNABLE

I'm just trying to get a better sense of what we can infer if we see these "transition" states.

>> src/java.base/share/classes/java/lang/VirtualThread.java line 952:
>> 
>>> 950:         for (;;) {
>>> 951:             boolean unblocked = false;
>>> 952:             synchronized (timedWaitLock()) {
>> 
>> Where is the overall design of the timed-wait protocol and it use of synchronization described?
>
> When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry.

Thanks for the explanation but that needs to be documented somewhere.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818239594
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811933408
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811935087
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814330162
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818236368
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818240013
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814163283
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818228510