RFR: 8327743: JVM crash in hotspot/share/runtime/javaThread.cpp - failed: held monitor count should be equal to jni: 0 != 1 [v6]
Fredrik Bredberg
fbredberg at openjdk.org
Thu Apr 11 05:50:43 UTC 2024
On Thu, 11 Apr 2024 01:27:15 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> The crux of the problem here is that the virtual thread code was not keeping the held-monitor-count and jni-monitor-count in sync under all conditions. So if a vthread acquired a monitor via JNI but failed to unlock it before terminating, the underlying platform thread's counts were out of sync and if it terminated we would trigger the assertion that checks for such things.
>>
>> The actual fix is very simple: we zero the platform thread's jni-monitor-count in `continuation_enter_cleanup` the same way we zero the held-monitor-count. In addition we apply the same `CheckJNICalls` check for this unbalanced locking and issue a warning in the virtual thread case. That fact this happens in asm code complicates matters.
>>
>> The existing `JNIMonitor.java` test is greatly expanded to test these scenarios and check the unified logging output.
>>
>> Other minor changes involve expanding some of the other assertions relating to the two counts so we can detected a mismatch earlier without a need for the thread to terminate. And the test that original uncovered the problem (`GetOwnedMonitorInfoTest.java`) has some minor adjustments to enhance diagnostics.
>>
>> I've provided the fix for all architectures that support continuations: x64, aarch64, riscv and ppc. The latter both build okay in GHA but I can't actually test them with the updated test. So some assistance from RISCV folk (@robehn ?) and PPC folk (??) would be appreciated (otherwise any issues will have to be handled as follow up fixes
>>
>> The changes are structured so that there is no extra code executed in product builds unless `CheckJNICalls` is set. This means that product builds will not keep the JNI count in sync with the held count, unless `CheckJNICalls` is set. This could trip up a future logging entry or explicit check of the JNI count, but it is expected that these counts will be removed once ObjectMonitor usage will not force virtual thread pinning.
>>
>> Testing:
>> - regression test 10x on all x64 and aarch64 platforms
>> - tiers 1-4
>> - GHA
>>
>>
>> Thanks to @pchilano for help working out the best form of the fix and the initial asm for x64.
>>
>> Thanks to @fbredber for the Aarch64 and RISCV asm code.
>>
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix typos, copyrights and add comment
Looks good to me.
-------------
Marked as reviewed by fbredberg (Committer).
PR Review: https://git.openjdk.org/jdk/pull/18445#pullrequestreview-1993325386
More information about the hotspot-dev
mailing list