RFR: 8253429: Error reporting should report correct state of terminated/aborted threads

Daniel D.Daugherty dcubed at openjdk.java.net
Wed Sep 30 17:03:10 UTC 2020


On Tue, 29 Sep 2020 15:17:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> For some non-JavaThread, their object instances can outlast threads' lifespan. For example, we still can query/report
>> thread's state after thread terminated.
>> But the query/report currently returns wrong state. E.g. a terminated thread appears to be alive and seemly has valid
>> thread stack, etc.
>> This patch sets non-JavaThread's state to ZOMBIE just before it terminates, so that we can distinguish terminated
>> thread from live thread.
>> Also, thread should not report its SMR info, if it has terminated or it never started (thread->osthread() == NULL).
>> 
>> Note: Java thread does not have such issue, its thread object is deleted before thread terminates.
>
> Hi Zhengyu,
> 
> I'm updating my review after reading through your conversation with David. Save for small nits this seem fine.
> 
> Cheers, Thomas

I think we're approaching this problem incorrectly. David mentioned this in
the bug report:

> As the reporting is done by the thread closure of the target subsystem
> this is not a runtime issue in this case but a GC issue.

To me, the first part of that sentence is the important part. It is indeed a
thread closure that causes us to reach the terminated thread. It is also a
thread closure that is used by Thread-SMR to determine when a thread's
ThreadsList protects JavaThreads.

In particular:

src/hotspot/share/runtime/threadSMR.cpp:
bool ThreadsSMRSupport::is_a_protected_JavaThread(JavaThread *thread) {

uses a ScanHazardPtrGatherProtectedThreadsClosure passed to
ThreadsSMRSupport::threads_do() to gather all the protected
JavaThread*.

This threads_do() function applies the closure to all threads in the
system: JavaThreads on 'list' and all the non-JavaThreads:

src/hotspot/share/runtime/threadSMR.cpp:
void ThreadsSMRSupport::threads_do(ThreadClosure *tc, ThreadsList *list) {
  list->threads_do(tc);
  Threads::non_java_threads_do(tc);
}

So if a particular non-JavaThread is still found via Threads::non_java_threads_do(),
then any ThreadsList that it holds protects JavaThread*'s even if that non-JavaThread
has terminated. That means that calling ThreadsSMRSupport::print_info_on() is a
valid thing to do because the non-JavaThread is still participating in Thread-SMR
related decisions.

I have no problem with the part where we set the ZOMBIE state as a marker
for a terminated non-JavaThread, but we need to determine why that
terminated thread is still being found by Threads::non_java_threads_do()
and whether it is safe to remove that non-JavaThread from whatever list
is holding it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/341


More information about the hotspot-runtime-dev mailing list