RFR: 8327743: JVM crash in hotspot/share/runtime/javaThread.cpp - failed: held monitor count should be equal to jni: 0 != 1
Robbin Ehn
rehn at openjdk.org
Mon Apr 8 12:35:44 UTC 2024
On Fri, 22 Mar 2024 06:26:03 GMT, David Holmes <dholmes at openjdk.org> wrote:
> The crux of the problem here is that the virtual thread code was not keeping the held-monitor-count and jni-monitor-count in sync under all conditions. So if a vthread acquired a monitor via JNI but failed to unlock it before terminating, the underlying platform thread's counts were out of sync and if it terminated we would trigger the assertion that checks for such things.
>
> The actual fix is very simple: we zero the platform thread's jni-monitor-count in `continuation_enter_cleanup` the same way we zero the held-monitor-count. In addition we apply the same `CheckJNICalls` check for this unbalanced locking and issue a warning in the virtual thread case. That fact this happens in asm code complicates matters.
>
> The existing `JNIMonitor.java` test is greatly expanded to test these scenarios and check the unified logging output.
>
> Other minor changes involve expanding some of the other assertions relating to the two counts so we can detected a mismatch earlier without a need for the thread to terminate. And the test that original uncovered the problem (`GetOwnedMonitorInfoTest.java`) has some minor adjustments to enhance diagnostics.
>
> I've provided the fix for all architectures that support continuations: x64, aarch64, riscv and ppc. The latter both build okay in GHA but I can't actually test them with the updated test. So some assistance from RISCV folk (@robehn ?) and PPC folk (??) would be appreciated (otherwise any issues will have to be handled as follow up fixes
>
> The changes are structured so that there is no extra code executed in product builds unless `CheckJNICalls` is set. This means that product builds will not keep the JNI count in sync with the held count, unless `CheckJNICalls` is set. This could trip up a future logging entry or explicit check of the JNI count, but it is expected that these counts will be removed once ObjectMonitor usage will not force virtual thread pinning.
>
> Testing:
> - regression test 10x on all x64 and aarch64 platforms
> - tiers 1-4
> - GHA
>
>
> Thanks to @pchilano for help working out the best form of the fix and the initial asm for x64.
>
> Thanks to @fbredber for the Aarch64 and RISCV asm code.
>
> Thanks
A question, as I have interpret this comment:
// 6282335 JNI DetachCurrentThread spec states that all Java monitors
// held by this thread must be released. The spec does not distinguish
// between JNI-acquired and regular Java monitors. We can only see
// regular Java monitors here if monitor enter-exit matching is broken.
Before monitor _count_ we did not know if we held any JNI locks.
With the count it is possible to iterate the JNI locks if the count is non-zero and unlock.
Thus implement what this comment says we are missing, and we always exit with 0 count.
A: Should we add that ? (normal thread exit/detach)
B: Should virtual threads also do that?
We could always call a "SharedRuntime::log_AND_UNLOCK_jni_monitor_still_held" here.
Ignoring above, looks good, and passed initial testing. (I'll do some more, no need to wait for that)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18445#issuecomment-2042611144
More information about the hotspot-dev
mailing list