RFR: 8253429: Error reporting should report correct state of terminated/aborted threads

Zhengyu Gu zgu at openjdk.java.net
Fri Sep 25 12:10:16 UTC 2020


On Fri, 25 Sep 2020 04:29:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> For some non-JavaThread, their object instances can outlast threads' lifespan. For example, we still can query/report
>> thread's state after thread terminated.
>> But the query/report currently returns wrong state. E.g. a terminated thread appears to be alive and seemly has valid
>> thread stack, etc.
>> This patch sets non-JavaThread's state to ZOMBIE just before it terminates, so that we can distinguish terminated
>> thread from live thread.
>> Also, thread should not report its SMR info, if it has terminated or it never started (thread->osthread() == NULL).
>> 
>> Note: Java thread does not have such issue, its thread object is deleted before thread terminates.
>
> Hi Zhengyu,
> 
> Could this expose us to a race?
> 
> What I mean is: on Linux and BSD I see this code in os::create_thread() after thread creation and after the initial
> handshake with the parent thread:
>> A pthread_create ...
>> B initial handshake...
>>
>> C // Aborted due to thread limit being reached
>>  if (state == ZOMBIE) {
>>    thread->set_osthread(NULL);
>>    delete osthread;
>>    return false;
>>  }
> 
> (Weirdly enough we don't do (C) for Windows and AIX).
> 
> I may be wrong do not find anywhere in the code where we set the ZOMBIE state today. So (C) may be just dead code now.
> 
> With your patch, we now set the state to ZOMBIE in post_run. If child finishes and sets ZOMBIE state before parent
> reaches (C) os::create_thread would return not false even though the child thread started correctly and ran through?

> _Mailing list message from [Thomas St��fe](mailto:thomas.stuefe at gmail.com) on
> [hotspot-runtime-dev](mailto:hotspot-runtime-dev at openjdk.java.net):_
> Hi David,
> 
> On Fri, Sep 25, 2020 at 7:06 AM David Holmes <david.holmes at oracle.com>
> wrote:
> 
> > Hi Thomas,
> > On 25/09/2020 2:31 pm, Thomas Stuefe wrote:
> > > On Thu, 24 Sep 2020 18:14:10 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
> > > > For some non-JavaThread, their object instances can outlast threads'
> > > > lifespan. For example, we still can query/report
> > > > thread's state after thread terminated.
> > > > But the query/report currently returns wrong state. E.g. a terminated
> > > > thread appears to be alive and seemly has valid
> > > > thread stack, etc.
> > > > This patch sets non-JavaThread's state to ZOMBIE just before it
> > > > terminates, so that we can distinguish terminated
> > > > thread from live thread.
> > > > Also, thread should not report its SMR info, if it has terminated or it
> > > > never started (thread->osthread() == NULL).
> > > > Note: Java thread does not have such issue, its thread object is
> > > > deleted before thread terminates.
> > > 
> > > 
> > > Hi Zhengyu,
> > > Could this expose us to a race?
> > > What I mean is: on Linux and BSD I see this code in os::create_thread()
> > > after thread creation and after the initial
> > > handshake with the parent thread:
> > > > A pthread_create ...
> > > > B initial handshake...
> > > > C // Aborted due to thread limit being reached
> > > > if (state == ZOMBIE) {
> > > > thread->set_osthread(NULL);
> > > > delete osthread;
> > > > return false;
> > > > }
> > > 
> > > 
> > > (Weirdly enough we don't do (C) for Windows and AIX).
> > > I may be wrong do not find anywhere in the code where we set the ZOMBIE
> > > state today. So (C) may be just dead code now.
> > 
> > 
> > Funnily enough it seems to be dead code that you created:
> > https://bugs.openjdk.java.net/browse/JDK-8078513
> > but mea culpa too as I missed it in the review. :)
> 
> Hah! Thanks for reminding me :)
> 
> Okay, that's still left over from pre-NPTL.
> 
> @zhengyu: Can you remove these two sections too in os_bsd.cpp and
> os_linux.cpp or should I do this in a separate patch?
> 
> Cheers Thomas
> 
> > Otherwise, yes this would be a theoretical race. So lets just delete the
> > above.
> > Cheers,
> > David
> > > With your patch, we now set the state to ZOMBIE in post_run. If child
> > > finishes and sets ZOMBIE state before parent
> > > reaches (C) os::create_thread would return not false even though the
> > > child thread started correctly and ran through?

Hi Thomas and David,

I did notice the dead code (no one sets osthread's state to ZOMBIE before this patch) and plan to file a separate CR.

I don't quite understand the race condition you described above.

My understanding is that, os::create_thread() starts native thread, but it blocks until os::start_thread() to unblock
it. Therefore, there is no chance for os::create_thread() to see ZOMBIE state.

What do I miss?

Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/341


More information about the hotspot-runtime-dev mailing list