RFC: more robust handling of terminated but still attached threads

Wed Jul 4 06:53:04 UTC 2018

After more experimentation and code scrutiny I could only find one place 
where this is actually a problem, so I will fix that under 820578.

David

On 3/07/2018 7:21 PM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
> 
> We hit asserts or trigger SEGVs when we try to operate on a native 
> thread ID for a JNI-attached thread that has actually terminated but 
> which did not detach first. It still appears in the threadsList and we 
> try to process it during DumpOnExit (but there are probably other 
> operations that could run into this in the general case).
> 
> Fixing the tests is easy. But the more general question is how to make 
> the VM code more robust in the face of this situation.
> 
> At the lowest level we can watch for ESRCH from pthread_* functions and 
> try to program in alternate logic that gives some "result" for that thread.
> 
> At higher-level we may be able to heuristically guess that the native 
> thread has terminated and so skip it in ALL_JAVA_THREADS and similar 
> constructors. For example pthread_kill(t,0) can heuristically check if 
> 't' is not alive as it may return ESRCH. But of course if t terminated 
> then it is entirely possible that the pthread_t value for it has been 
> reused. And if t is not going to detach we could be racing with its 
> termination anyway - so the heuristic may pass and we still hit a 
> low-level assert or SEGV.
> 
> What do people think? Do we try to deal with this at the bottom, or at 
> the top, or all the way through? (There's obviously a diminishing return 
> on effort versus benefit here.)
> 
> Thanks,
> David