<div dir="ltr"><div>Thank you, David, for explanation and confirmation! <br></div><div><br></div><div>I try to understand what that means for SafePoints. A thread can only have exited on its own in third-party native code. So in native, which would make it safepoint-safe, the VM would not wait for it, right?</div><div><br></div><div>Other than that, I wonder whether we keep pointers to thread stack in global state somewhere. That seems to be the most obvious vulnerability.<br></div><div><br></div><div>If this would be really an issue, I think one could add a facility that checks threads for existence periodically, possibly as part of the JNI check. Maybe similar to what we do in java.process, where we ascertain identity via a (pid, start time) tupel. But as you wrote, there have been almost no observed issues on the other *nixes.<br></div><div><br></div><div>..Thomas<br></div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 20, 2023 at 2:21 AM David Holmes <<a href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Thomas,<br>
<br>
On 18/11/2023 1:42 am, Thomas Stüfe wrote:<br>
> Hi,<br>
> <br>
> the AIX folks have problems with <br>
> runtime/jni/terminatedThread/TestTerminatedThread.java. I am trying to <br>
> understand some details and would be happy for pointers.<br>
> <br>
> The way I understand TestTerminatedThread.java and the RFR discussion <br>
> for 8205878 [1], the test seems to deliberately omit <br>
> JNI_DetachCurrentThread to simulate a JNI coding error, right?<br>
<br>
Right.<br>
<br>
> It joins <br>
> the thread, causing the OS to clean out all associated resources. The <br>
> pthread_t, kernel thread id, stack, etc all become invalid. The test <br>
> then nudges the VM in various ways to shake out problems relating to the <br>
> continued use of these resources.<br>
> <br>
> Is my understanding correct, or am I missing something?<br>
<br>
That is correct.<br>
<br>
> If I got this right so far, is this not inherently unstable?<br>
<br>
Not sure if "unstable" is the right word but yes it can have issues.<br>
<br>
> What <br>
> happens if the associated resources get reused by the libc? pthread_t <br>
> could be a pointer to a struct or a slot index into a table, and get <br>
> reused by a different thread. The kernel thread id could be reused too.<br>
<br>
It is an interesting question, but beyond this test what happens with <br>
real code if that were the case? We can't detect it. We will just have <br>
an "orphan" Thread that we can query in various ways hence ...<br>
<br>
... the test is just a "canary" to see if the VM encounters any <br>
problematic scenarios when the various API's are applied to a thread <br>
that terminated without detaching, and which the VM can handle more <br>
robustly.<br>
<br>
It turned out that other than the original CPU time issue, nothing bad <br>
is observed on Linux, BSD/macxOS in general. We did have one case on <br>
Linux PPC [1] were we saw something unexpected and had to adjust the <br>
test. It may be that we need something for AIX too? Or we can skip it on <br>
AIX if necessary.<br>
<br>
Cheers,<br>
David<br>
<br>
[1] <a href="https://bugs.openjdk.org/browse/JDK-8211931" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8211931</a><br>
<br>
<br>
<br>
> Thanks, Thomas<br>
> <br>
> [1] <br>
> <a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a> <<a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a>><br>
</blockquote></div></div>