RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Wed Nov 6 17:39:18 UTC 2024

This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details.

In order to make the code review easier the changes have been split into the following initial 4 commits:

- Changes to allow unmounting a virtual thread that is currently holding monitors.
- Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor.
- Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants.
- Changes to tests, JFR pinned event, and other changes in the JDK libraries.

The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones.

The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases.

## Summary of changes

### Unmount virtual thread while holding monitors

As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things:

- We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads.

- For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around.

#### General notes about this part:

- Since virtual threads don't need to worry about holding monitors anymore, we don't need to count them, except for `LM_LEGACY`. So the majority of the platform dependent changes in this commit have to do with correcting this.
- Zero and x86 (32 bits) where counting monitors even though they don't implement continuations, so I fixed that to stop counting. The idea is to remove all the counting code once we remove `LM_LEGACY`.
- Macro `LOOM_MONITOR_SUPPORT` was added at the time to exclude ports that implement continuations but don't yet implement monitor support. It is removed later with the ppc commit changes.
- Since now a virtual thread can be unmounted while holding monitors, JVMTI methods `GetOwnedMonitorInfo` and `GetOwnedMonitorStackDepthInfo` had to be adapted.

#### Notes specific to the tid changes:

- The tid is cached in the JavaThread object under `_lock_id`. It is set on JavaThread creation and changed on mount/unmount.
- Changes in the ObjectMonitor class in this commit are pretty much exclusively related to changing `_owner` and `_succ` from `void*` and `JavaThread*` respectively to `int64_t`.
- Although we are not trying to fix `LM_LEGACY` the tid changes apply to it as well since the inflated path is shared. Thus, in case of inflation by a contending thread, the `BasicLock*` cannot be stored in the `_owner` field as before. The `_owner` is instead set to anonymous as we do in `LM_LIGHTWEIGHT`, and the `BasicLock*` is stored in the new field `_stack_locker`.
- We already assume 32 bit platforms can handle 64 bit atomics, including `cmpxchg` ([JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776)) so the shared code can stay the same. The assembly code for the c2 fast paths has to be adapted though. On arm (32bits) we already jump directly to the slow path on inflated monitor case so there is nothing to do. For x86 (32bits), since the port is moving towards deprecation ([JDK-8338285](https://bugs.openjdk.org/browse/JDK-8338285)) there is no point in trying to optimize, so the code was changed to do the same thing we do for arm (32bits).

### Unmounting a virtual thread blocked on synchronized

Currently virtual thread unmounting is always started from Java, either because of a voluntarily call to `Thread.yield()` or because of performing some blocking operation such as I/O. Now we allow to unmount from inside the VM too, specifically when facing contention trying to acquire a Java monitor.

On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. Once the owner releases the monitor and selects it as the next successor the virtual thread will be added again to the scheduler queue to run again. The virtual thread will run and attempt to acquire the monitor again. If it succeeds then it will thaw frames as usual to continue execution back were it left off. If it fails it will unmount and wait again to be unblocked.

#### General notes about this part:

- The easiest way to review these changes is to start from the monitorenter call in the interpreter and follow all the flow of the virtual thread, from unmounting to running again.
- Currently we use a dedicated unblocker thread to submit the virtual threads back to the scheduler queue. This avoids calls to Java from monitorexit. We are experimenting on removing this limitation, but that will be left as an enhancement for a future change.
- We cannot unmount the virtual thread when the monitor enter call is coming from `jni_enter()` or `ObjectLocker` since we would need to freeze native frames.
- If freezing fails, which almost always will be due to having native frames on the stack, the virtual thread will follow the normal platform thread logic but will do a timed-park instead. This is to alleviate some deadlocks cases where the successor picked is an unmounted virtual thread that cannot run, which can happen during class loading or class initiatialization.
- After freezing all frames, and while adding itself to the `_cxq` the virtual thread could have successfully acquired the monitor. In that case we mark the preemption as cancelled. The virtual thread will still need to go back to the preempt stub to cleanup the physical stack but instead of unmounting it will call thaw to continue execution.
- The way we jump to the preempt stub is slightly different in the compiler and interpreter. For the compiled case we just patch a return address, so no new code is added. For the interpreter we cannot do this on all platforms so we just check a flag back in the interpreter. For the latter we also need to manually restore some state after we finally acquire the monitor and resume execution. All that logic is contained in new assembler method `call_VM_preemptable()`.

#### Notes specific to JVMTI changes:
- Since we are not unmounting from Java, there is no call to `VirtualThread.yieldContinuation()`. This means that we have to execute the equivalent of `notifyJvmtiUnmount(/*hide*/true)` for unmount, and of `notifyJvmtiMount(/*hide*/false)` for mount in the VM. The former is implemented with `JvmtiUnmountBeginMark` in `Continuation::try_preempt()`. The latter is implemented in method `jvmti_mount_end()` in `ContinuationFreezeThaw` at the end of thaw.
- When unmounting from Java the vthread unmount event is posted before we try to freeze the continuation. If that fails then we post the mount event. This all happens in `VirtualThread.yieldContinuation()`. When unmounting from the VM we only post the event once we know the freeze succeeded. Since at that point we are in the middle of the VTMS transition, posting the event is done in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()` after the transition finishes. Maybe the same thing should be done when unmounting from Java.

### Unmounting a virtual thread blocked on `Object.wait()`

This commit just extends the previous mechanism to be able to unmount inside the VM on `ObjectMonitor::wait`.

####  General notes about this part:
- The mechanism works as before with the difference that now the call will come from the native wrapper. This requires to add support to the continuation code to handle native wrapper frames, which is a main part of the changes in this commit.
- Both the compiled and interpreted native wrapper code will check for preemption on return from the wait call, after we have transitioned back to `_thread_in_Java`.

####  Note specific to JVMTI changes:
- If the monitor waited event is enabled we need to post it after the wait is done but before re-acquiring the monitor. Since the virtual thread is inside the VTMS transition at that point, we cannot do that directly. Currently in the code we end the transition, post the event and start the transition again. This is not ideal, and maybe we should unmount, post the event and then run again to try reacquire the monitor.

### Test changes + JFR Updates + Library code changes

#### Tests 

- The tests in `java/lang/Thread/virtual` are updated to add more tests for monitor enter/exit and Object.wait/notify. New tests are added for JFR events, synchronized native methods, and stress testing for several scenarios.
- `test/hotspot/gtest/nmt/test_vmatree.cpp` is changed due to an alias that conflicts. 
- A small number of tests, e.g.` test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.java` and `test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002`, are updated so they are in sync with the JDK code. 
- A number of JVMTI tests are updated to fix various issues, e.g. some tests saved a JNIEnv in a static. 

#### Diagnosing remaining pinning issues

- The diagnostic option `jdk.tracePinnedThreads` is removed. 
- The JFR `jdk.VirtualThreadPinned` event is changed so that it's now recorded in the VM, and for the following cases: parking when pinned, blocking in monitor enter when pinned, Object.wait when pinned, and waiting for a class to be initialized by another thread. The changes to object monitors should mean that only a few events are recorded. Future work may change this to a sampling approach.

#### Other changes to VirtualThread class

The VirtualThread implementation includes a few robustness changes. The `park/parkNanos` methods now park on the carrier if the freeze throws OOME. Moreover, the use of transitions is reduced so that the call out to the scheduler no longer requires a temporary transition. 

#### Other changes to libraries: 

- `ReferenceQueue` is reverted to use `synchronized`, the subclass based on `ReentrantLock` is removed. This change is done now because the changes for object monitors impact this area when there is preemption polling a reference queue. 
- `java.io` is reverted to use `synchronized`. This change has been important for testing virtual threads. There will be follow-up cleanup in main-line after the JEP is integrated to remove `InternalLock` and its uses in `java.io`. 
- The epoll and kqueue based Selectors are changed to preempt when doing blocking selects. This has been useful for testing virtual threads with some libraries, e.g. JDBC drivers. We could potentially separate this update if needed but it has been included in all testing and EA builds. 
- `sun.security.ssl.X509TrustManagerImpl` is changed to eagerly initialize AnchorCertificates, a forced change due to deadlocks in this code when testing. 

## Testing 

The changes have been running in the Loom pipeline for several months now. They have also been included in EA builds throughout the year at different stages (EA builds from earlier this year did not had Object.wait() support yet but more recent ones did) so there has been some external exposure too.

The current patch has been run through mach5 tiers 1-8. I'll keep running tests periodically until integration time.

-------------

Commit messages:
 - Use is_top_frame boolean in FreezeBase::check_valid_fast_path()
 - Move load of _lock_id in C2_MacroAssembler::fast_lock
 - Add --enable-native-access=ALL-UNNAMED to SynchronizedNative.java
 - Update comment for _cont_fastpath
 - Add ReflectionCallerCacheTest.java to test/jdk/ProblemList-Xcomp.txt
 - Use ThreadIdentifier::initial() in ObjectMonitor::owner_from()
 - Fixes to JFR metadata.xml
 - Fix return miss prediction in generate_native_entry for riscv
 - Fix s390x failures
 - Add oopDesc::has_klass_gap() check
 - ... and 70 more: https://git.openjdk.org/jdk/compare/751a914b...211c6c81

Changes: https://git.openjdk.org/jdk/pull/21565/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8338383
  Stats: 9914 lines in 246 files changed: 7105 ins; 1629 del; 1180 mod
  Patch: https://git.openjdk.org/jdk/pull/21565.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565

PR: https://git.openjdk.org/jdk/pull/21565