RFR: 8375209: Xcheck:jni should check when GC is about to deadlock in JNI critical section [v2]
Jorn Vernee
jvernee at openjdk.org
Tue Jan 13 21:42:19 UTC 2026
On Tue, 13 Jan 2026 19:44:23 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> Related to [JDK-8375188](https://bugs.openjdk.org/browse/JDK-8375188), and regardless what happens with the implementations, I think we really want to have `-Xcheck:jni` to tell us when we are about to deadlock. This is useful to diagnose the issue in the field.
>>
>> We used to have this capability in Serial/Parallel prior to [JDK-8192647](https://bugs.openjdk.org/browse/JDK-8192647), AFAICS: https://github.com/openjdk/jdk/commit/a9c9f7f0cbb2f2395fef08348bf867ffa8875d73#diff-d27fc793db1bf9314b322d494cd1c3269629fe27a605b4441de08d543d020fc3L341-L344
>>
>> ZGC never had this check, AFAICS. I am not sure if I put the check in the right place. I believe it is in the right one, as we want to check that Java thread is not blocked waiting for GC driver to respond while being in JNI critical section itself. Current placement works well with the test.
>>
>> I opted to add the checking at the paths that are really affected by the issue, because it is really about what implementations are doing in this case. But we can also summarily check this in all `CollectedHeap::collect` overrides -- similar to ZGC case -- so that testing with `-Xcheck:jni` with Epsilon/G1/Shenandoah would also cover every other GC that might run into trouble.
>>
>> Additional testing:
>> - [x] New test, 100x repetitions
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> Terminology
Given the restriction that no other JNI functions should be called inside a critical region, I was wondering how a thread running plain native code could ever get blocked by the GC.
I see in the test that you are returning to Java during a critical region. I don't think we should be leaving the native thread state at all during a critical region. The spec of `GetPrimitiveArrayCritical` also implies this by specifically talking about _native code_:
> Inside a critical region, **native code** must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread.
(Of course, once we return to Java, the user no longer has control over whether the thread calls JNI functions or blocking system calls either)
Have you considered adding this check in the thread state transition code instead?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29206#issuecomment-3746681978
More information about the hotspot-gc-dev
mailing list