Question about JDK-8205878 (tolerating missing JNI Detach)

Obermeier, Thomas thomas.obermeier at sap.com
Tue Nov 21 16:49:11 UTC 2023


Hi everyone,

Indeed thanks to you guys for all your input. Seems we are lacking expertise and resources here to do more research. Presumably, we will exclude the test on AIX.

… Thomas

From: Thomas Stüfe <thomas.stuefe at gmail.com>
Sent: Tuesday, 21 November 2023 14:12
To: David Holmes <david.holmes at oracle.com>
Cc: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>; Kern, Joachim <joachim.kern at sap.com>; Obermeier, Thomas <thomas.obermeier at sap.com>; Doerr, Martin <martin.doerr at sap.com>
Subject: Re: Question about JDK-8205878 (tolerating missing JNI Detach)

Thank you, David, for all the clarifications. I think the AIX devs have enough information now to go on searching, or to decide whether to exclude the test on AIX.

Cheers, Thomas

On Mon, Nov 20, 2023 at 8:44 AM David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
On 20/11/2023 4:15 pm, Thomas Stüfe wrote:
> Thank you, David, for explanation and confirmation!
>
> I try to understand what that means for SafePoints. A thread can only
> have exited on its own in third-party native code. So in native, which
> would make it safepoint-safe, the VM would not wait for it, right?

Right.

> Other than that, I wonder whether we keep pointers to thread stack in
> global state somewhere. That seems to be the most obvious vulnerability.

Well the obvious place would be if the thread exited with locked
monitors and then we'd have a BasicObjectLock* in the object's markword.
That could be a crash waiting to happen.

> If this would be really an issue, I think one could add a facility that
> checks threads for existence periodically, possibly as part of the JNI
> check. Maybe similar to what we do in java.process, where we ascertain
> identity via a (pid, start time) tupel. But as you wrote, there have
> been almost no observed issues on the other *nixes.

The aim here is not to try and make things safe for such errant threads
- you do this and you're on your own. We just stumbled across this with
a badly written test and so wanted to check that we didn't crash with
the obvious cases if we operated on the java.lang.Thread.

Cheers,
David
-----

> ..Thomas
>
> On Mon, Nov 20, 2023 at 2:21 AM David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com<mailto:david.holmes at oracle.com>>> wrote:
>
>     Hi Thomas,
>
>     On 18/11/2023 1:42 am, Thomas Stüfe wrote:
>      > Hi,
>      >
>      > the AIX folks have problems with
>      > runtime/jni/terminatedThread/TestTerminatedThread.java. I am
>     trying to
>      > understand some details and would be happy for pointers.
>      >
>      > The way I understand TestTerminatedThread.java and the RFR
>     discussion
>      > for 8205878 [1], the test seems to deliberately omit
>      > JNI_DetachCurrentThread to simulate a JNI coding error, right?
>
>     Right.
>
>      > It joins
>      > the thread, causing the OS to clean out all associated resources.
>     The
>      > pthread_t, kernel thread id, stack, etc all become invalid. The test
>      > then nudges the VM in various ways to shake out problems relating
>     to the
>      > continued use of these resources.
>      >
>      > Is my understanding correct, or am I missing something?
>
>     That is correct.
>
>      > If I got this right so far, is this not inherently unstable?
>
>     Not sure if "unstable" is the right word but yes it can have issues.
>
>      > What
>      > happens if the associated resources get reused by the libc?
>     pthread_t
>      > could be a pointer to a struct or a slot index into a table, and get
>      > reused by a different thread. The kernel thread id could be
>     reused too.
>
>     It is an interesting question, but beyond this test what happens with
>     real code if that were the case? We can't detect it. We will just have
>     an "orphan" Thread that we can query in various ways hence ...
>
>     ... the test is just a "canary" to see if the VM encounters any
>     problematic scenarios when the various API's are applied to a thread
>     that terminated without detaching, and which the VM can handle more
>     robustly.
>
>     It turned out that other than the original CPU time issue, nothing bad
>     is observed on Linux, BSD/macxOS in general. We did have one case on
>     Linux PPC [1] were we saw something unexpected and had to adjust the
>     test. It may be that we need something for AIX too? Or we can skip
>     it on
>     AIX if necessary.
>
>     Cheers,
>     David
>
>     [1] https://bugs.openjdk.org/browse/JDK-8211931
>     <https://bugs.openjdk.org/browse/JDK-8211931>
>
>
>
>      > Thanks, Thomas
>      >
>      > [1]
>      >
>     https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html> <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/attachments/20231121/12a745f6/attachment-0001.htm>


More information about the hotspot-runtime-dev mailing list