RFC: more robust handling of terminated but still attached threads

Tue Jul 3 12:09:26 UTC 2018

On 3/07/2018 9:28 PM, Florian Weimer wrote:
> On 07/03/2018 11:21 AM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
>>
>> We hit asserts or trigger SEGVs when we try to operate on a native 
>> thread ID for a JNI-attached thread that has actually terminated but 
>> which did not detach first. It still appears in the threadsList and we 
>> try to process it during DumpOnExit (but there are probably other 
>> operations that could run into this in the general case).
> 
> This bug is not public.

Sorry I'll try to get that changed. It's an issue with some of the newly 
opened tests in:

vmTestbase/nsk/jvmti/scenarios/jni_interception/

when run with FlightRecorder set to use DumpOnExit. I will of course fix 
the tests.

> The use case isn't entirely clear to me.  If you are sufficiently 
> unlucky, the memory behind a pthread_t value is simply gone after thread 
> exit (and potentially TCB/thread stack reclamation in the thread 
> library).  On glibc, this includes the internal TID, which is required 
> for pthread_kill (thr, 0) actually sending the signal.

IIUC pthread_kill(thr,0) never sends any signal, but may lookup the id 
to see if it is valid. I understand there's no guarantee and that there 
is an inherent race regardless.

> I'm not familiar with the Hotspot run-time and why it needs to do this. 
> Can you deregister the thread from a thread directory once it exits 
> (using one of the TLS variants with a destructor)?  Or is the concern 
> there that the destructor would not run late enough?

The issue is native process threads that attach to the VM through JNI 
but then don't detach themselves before terminating. While it may be 
possible to create such a mechanism as you describe it goes way beyond 
what I'm trying to do here and violates a basic principle that we try to 
interfere as little as possible with threads that attach to the VM 
directly (rather than being created by the VM). There was also a rather 
complex bug involving native threads that themselves provided such a TLS 
destructor (to detach themselves) and the VMs own (fairly recent) use of 
TLS.

All I'm looking at is some basic robustness if the VM encounters such a 
thread (for which all the VM data structures remain intact - and 
effectively leak) so that we don't assert or crash when we do invoke a 
pthread function (pthread_getcpuclockid is the one in question in the 
bug report).

It may be that it isn't really worth trying to do this given it can't be 
100% reliable anyway.

Thanks,
David

> Thanks,
> Florian