RFC: more robust handling of terminated but still attached threads

Tue Jul 3 12:19:57 UTC 2018

On 07/03/2018 02:09 PM, David Holmes wrote:

>> The use case isn't entirely clear to me.  If you are sufficiently 
>> unlucky, the memory behind a pthread_t value is simply gone after 
>> thread exit (and potentially TCB/thread stack reclamation in the 
>> thread library).  On glibc, this includes the internal TID, which is 
>> required for pthread_kill (thr, 0) actually sending the signal.
> 
> IIUC pthread_kill(thr,0) never sends any signal, but may lookup the id 
> to see if it is valid. I understand there's no guarantee and that there 
> is an inherent race regardless.

It still makes a system call to send the pseudo-signal 0.  This is what 
I meant.  It can bail out earlier in case of terminated threads which 
have not yet been joined, though.

>> I'm not familiar with the Hotspot run-time and why it needs to do 
>> this. Can you deregister the thread from a thread directory once it 
>> exits (using one of the TLS variants with a destructor)?  Or is the 
>> concern there that the destructor would not run late enough?
> 
> The issue is native process threads that attach to the VM through JNI 
> but then don't detach themselves before terminating. While it may be 
> possible to create such a mechanism as you describe it goes way beyond 
> what I'm trying to do here and violates a basic principle that we try to 
> interfere as little as possible with threads that attach to the VM 
> directly (rather than being created by the VM). There was also a rather 
> complex bug involving native threads that themselves provided such a TLS 
> destructor (to detach themselves) and the VMs own (fairly recent) use of 
> TLS.
> 
> All I'm looking at is some basic robustness if the VM encounters such a 
> thread (for which all the VM data structures remain intact - and 
> effectively leak) so that we don't assert or crash when we do invoke a 
> pthread function (pthread_getcpuclockid is the one in question in the 
> bug report).

You could capture the TID and the task creation time from /proc when the 
thread is attached, and try to recover the information you need from 
/proc afterwards (possibly with a comparison to the startup time).

You probably cannot ensure that the thread will not suddenly cease to 
exist, so none of the pthread_* functions cannot be called.  The only 
in-process way I can image which ensures that the thread stays around is 
to send it a signal with an unblocked handler which you control, and 
which can then prevent the thread from exiting indefinitely.  But that 
is a very heavy-handed approach.

Out-of-process, you could use ptrace to freeze threads.

Thanks,
Florian