<div dir="ltr"><div>Thank you, David, for all the clarifications. I think the AIX devs have enough information now to go on searching, or to decide whether to exclude the test on AIX.</div><div><br></div><div>Cheers, Thomas<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 20, 2023 at 8:44 AM David Holmes <<a href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 20/11/2023 4:15 pm, Thomas Stüfe wrote:<br>
> Thank you, David, for explanation and confirmation!<br>
> <br>
> I try to understand what that means for SafePoints. A thread can only <br>
> have exited on its own in third-party native code. So in native, which <br>
> would make it safepoint-safe, the VM would not wait for it, right?<br>
<br>
Right.<br>
<br>
> Other than that, I wonder whether we keep pointers to thread stack in <br>
> global state somewhere. That seems to be the most obvious vulnerability.<br>
<br>
Well the obvious place would be if the thread exited with locked <br>
monitors and then we'd have a BasicObjectLock* in the object's markword. <br>
That could be a crash waiting to happen.<br>
<br>
> If this would be really an issue, I think one could add a facility that <br>
> checks threads for existence periodically, possibly as part of the JNI <br>
> check. Maybe similar to what we do in java.process, where we ascertain <br>
> identity via a (pid, start time) tupel. But as you wrote, there have <br>
> been almost no observed issues on the other *nixes.<br>
<br>
The aim here is not to try and make things safe for such errant threads <br>
- you do this and you're on your own. We just stumbled across this with <br>
a badly written test and so wanted to check that we didn't crash with <br>
the obvious cases if we operated on the java.lang.Thread.<br>
<br>
Cheers,<br>
David<br>
-----<br>
<br>
> ..Thomas<br>
> <br>
> On Mon, Nov 20, 2023 at 2:21 AM David Holmes <<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a> <br>
> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a>>> wrote:<br>
> <br>
> Hi Thomas,<br>
> <br>
> On 18/11/2023 1:42 am, Thomas Stüfe wrote:<br>
> > Hi,<br>
> ><br>
> > the AIX folks have problems with<br>
> > runtime/jni/terminatedThread/TestTerminatedThread.java. I am<br>
> trying to<br>
> > understand some details and would be happy for pointers.<br>
> ><br>
> > The way I understand TestTerminatedThread.java and the RFR<br>
> discussion<br>
> > for 8205878 [1], the test seems to deliberately omit<br>
> > JNI_DetachCurrentThread to simulate a JNI coding error, right?<br>
> <br>
> Right.<br>
> <br>
> > It joins<br>
> > the thread, causing the OS to clean out all associated resources.<br>
> The<br>
> > pthread_t, kernel thread id, stack, etc all become invalid. The test<br>
> > then nudges the VM in various ways to shake out problems relating<br>
> to the<br>
> > continued use of these resources.<br>
> ><br>
> > Is my understanding correct, or am I missing something?<br>
> <br>
> That is correct.<br>
> <br>
> > If I got this right so far, is this not inherently unstable?<br>
> <br>
> Not sure if "unstable" is the right word but yes it can have issues.<br>
> <br>
> > What<br>
> > happens if the associated resources get reused by the libc?<br>
> pthread_t<br>
> > could be a pointer to a struct or a slot index into a table, and get<br>
> > reused by a different thread. The kernel thread id could be<br>
> reused too.<br>
> <br>
> It is an interesting question, but beyond this test what happens with<br>
> real code if that were the case? We can't detect it. We will just have<br>
> an "orphan" Thread that we can query in various ways hence ...<br>
> <br>
> ... the test is just a "canary" to see if the VM encounters any<br>
> problematic scenarios when the various API's are applied to a thread<br>
> that terminated without detaching, and which the VM can handle more<br>
> robustly.<br>
> <br>
> It turned out that other than the original CPU time issue, nothing bad<br>
> is observed on Linux, BSD/macxOS in general. We did have one case on<br>
> Linux PPC [1] were we saw something unexpected and had to adjust the<br>
> test. It may be that we need something for AIX too? Or we can skip<br>
> it on<br>
> AIX if necessary.<br>
> <br>
> Cheers,<br>
> David<br>
> <br>
> [1] <a href="https://bugs.openjdk.org/browse/JDK-8211931" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8211931</a><br>
> <<a href="https://bugs.openjdk.org/browse/JDK-8211931" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8211931</a>><br>
> <br>
> <br>
> <br>
> > Thanks, Thomas<br>
> ><br>
> > [1]<br>
> ><br>
> <a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a> <<a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a>> <<a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a> <<a href="https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html" rel="noreferrer" target="_blank">https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2018-July/029022.html</a>>><br>
> <br>
</blockquote></div>