RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

David Holmes david.holmes at oracle.com
Mon May 9 11:03:29 UTC 2016


On 9/05/2016 7:29 PM, Stefan Karlsson wrote:
> Hi,
>
> For java_start, would it be enough to just, unconditionally, call
> ThreadLocalStorage::set_thread(NULL) after thread->run() ?

That would suffice for the current problem but seems a bit patchy to me 
- it's only a partial cleanup of TLS.

What I am going to do is based On Brian's suggestion:

if (Thread::current_or_null_safe() != NULL) {
   thread->clear_thread_current();
}

with the addition that I not only undo the deletion of the 
WatcherThread, but I also remove the deletion of the VMThread.

Was hoping to have a webrev ready but have run into a weird build problem.

Thanks,
David

> StefanK
>
> On 2016-05-07 07:45, David Holmes wrote:
>> Brian, Stefan,
>>
>> I've been giving this more thought and I think you are both right.
>> While this can't be completely handled in java_start for all threads
>> it can handle the most common cases. If I don't add the deletion of
>> the WatcherThread then the problematic case reduces to that of the
>> VMThread.
>>
>> I'll do a new version on Monday. I also need to double-check those
>> threads that don't use java_start (which I am sorely tempted to rename).
>>
>> Thanks,
>> David
>>
>> On 7/05/2016 9:50 AM, David Holmes wrote:
>>> On 7/05/2016 5:40 AM, Brian Gardner wrote:
>>>> I agree with Stefan.  When I initially ran into the problem I came up
>>>> with the following changeset that solves my problem, by calling
>>>> clear_thread_current at the end of java_start.
>>>>
>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
>>>>
>>>>
>>>
>>> As I said I can see the appeal in doing this, but there is still a race
>>> for the threads destroyed at VM shutdown as current_or_null() can return
>>> non-NULL but the delete can happen before we call
>>> thread->clear_current_thread().
>>>
>>> You would also want Thread::current_or_null_safe() I think, due to
>>> already deleted JavaThreads.
>>>
>>> And this all side-steps the real issue in my opinion that with clean
>>> thread lifecycle management we should always be able to delete the
>>> terminating thread.
>>>
>>> So to me it is a question of selecting which set of bandaids you want to
>>> apply.
>>>
>>> Thanks,
>>> David
>>>
>>>> Brian
>>>>
>>>>> On May 6, 2016, at 8:41 AM, Stefan Karlsson
>>>>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>>>> wrote:
>>>>>
>>>>> On 06/05/16 16:32, David Holmes wrote:
>>>>>> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 06/05/16 15:38, David Holmes wrote:
>>>>>>>> Hi Stefan,
>>>>>>>>
>>>>>>>> Thanks for taking a look at this.
>>>>>>>>
>>>>>>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> I looked through the GC part of this webrev and I think the
>>>>>>>>> change is
>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>> However, it seems a bit error prone. If we decide to change the
>>>>>>>>> code to,
>>>>>>>>> for example, terminate the  AbstractGangWorker threads, then we
>>>>>>>>> have to
>>>>>>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>>>>>>>
>>>>>>>> That's why I added the ShouldNotReachHere()'s - if those threads
>>>>>>>> start
>>>>>>>> terminating then we will see those hit. Perhaps a comment:
>>>>>>>>
>>>>>>>> ShouldNotReachHere(); // If thread terminates we have to do TLS
>>>>>>>> cleanup
>>>>>>>>
>>>>>>>> ?
>>>>>>>
>>>>>>> Yes, I would appreciate a comment. Though, when we add new
>>>>>>> threads, we
>>>>>>> need to remember to add the set_thread(NULL) call.
>>>>>>
>>>>>> Well no, what you would do is manage your new threads in such a way
>>>>>> that their run() method can do "delete this" as the last call. Only
>>>>>> if you can't do that do you need to think about what termination
>>>>>> logic is missing that needs to be done in lieu of the destructor.
>>>>>
>>>>> Yes, but this forces every implementer of a Thread:run() function to
>>>>> have to think about these kind of requirements.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>> Could we instead add a call to
>>>>>>>>> ThreadLocalStorage::set_thread(NULL), or
>>>>>>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>>>>>>>
>>>>>>>>> static void *java_start(Thread *thread) {
>>>>>>>>> [...]
>>>>>>>>>  thread->initialize_thread_current();
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>  // call one more level start routine
>>>>>>>>>  thread->run();
>>>>>>>>>
>>>>>>>>>  ////////// Could we call Thread::clear_thread_current(); here?
>>>>>>>>
>>>>>>>> Not easily. For JavaThreads we've already done "delete this" inside
>>>>>>>> the run() method, so we'd have to move that into java_start as
>>>>>>>> well,
>>>>>>>> but we can only do the delete for JavaThreads not for other
>>>>>>>> threads.
>>>>>>>> And we'd also have to change the VMThread and WatcherThread
>>>>>>>> termination logic because of the deletes that happen in the
>>>>>>>> termination thread - the "this" pointer (thread above) may no
>>>>>>>> longer
>>>>>>>> be valid when we want to call clear_current_thread() - which is
>>>>>>>> why we
>>>>>>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>>>>>>>
>>>>>>>> I agree it would be a lot cleaner to have java_start do:
>>>>>>>>
>>>>>>>>  thread->common_initialization();
>>>>>>>>  thread->run();
>>>>>>>>  thread->common_cleanup();
>>>>>>>>  delete thread;
>>>>>>>>
>>>>>>>> for all threads, but we'd need a lot of other changes to allow for
>>>>>>>> that. Otherwise we would need to note that kind of thread before
>>>>>>>> calling run() then switch on the thread type after run() to decide
>>>>>>>> what kind of cleanup is necessary and possible. I don't think that
>>>>>>>> would be better than just doing the "right" cleanup at the end
>>>>>>>> of the
>>>>>>>> run() methods.
>>>>>>>
>>>>>>> I understand that this is a bit messy, and I won't insist that we
>>>>>>> change
>>>>>>> this in this RFR, but without looking at this in much detail it
>>>>>>> sounds
>>>>>>> weird to delete the thread in run(). Couldn't this be solved by
>>>>>>> introducing a virtual Thread::post_run() function and do:
>>>>>>>
>>>>>>> virtual void Thread::post_run() {
>>>>>>>  clear_thread_current();
>>>>>>> }
>>>>>>>
>>>>>>> virtual void JavaThread::post_run() {
>>>>>>>  Thread::post_run();
>>>>>>>  delete this;
>>>>>>> }
>>>>>>
>>>>>> But again this can't work for the VMThread or WatcherThread as they
>>>>>> are deleted from the termination thread and so thread->post_run() may
>>>>>> SEGV.** Plus it is only after the fact that you realize not to put
>>>>>> "delete this" in Thread::post_run().
>>>>>
>>>>> OK, I didn't understand what you meant with "termination thread", but
>>>>> I now see the call to VMThread::destroy().
>>>>>
>>>>> With that said, I find it odd that VMThread::destroy() deletes the VM
>>>>> thread. We already handshake between the VMThread and the "termination
>>>>> thread", so why isn't that VMThread::post_run() implemented as:
>>>>>
>>>>> virtual void VMThread::post_run() {
>>>>>  // signal other threads that VM process is gone
>>>>>  {
>>>>>    // Note: we must have the _no_safepoint_check_flag. Mutex::lock()
>>>>> allows
>>>>>    // VM thread to enter any lock at Safepoint as long as its _owner
>>>>> is NULL.
>>>>>    // If that happens after _terminate_lock->wait() has unset _owner
>>>>>    // but before it actually drops the lock and waits, the
>>>>> notification below
>>>>>    // may get lost and we will have a hang. To avoid this, we need to
>>>>> use
>>>>>    // Mutex::lock_without_safepoint_check().
>>>>>    MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
>>>>>    _terminated = true;
>>>>>    _terminate_lock->notify();
>>>>>  }
>>>>>
>>>>>  Thread::post_run();
>>>>>  delete this;
>>>>> }
>>>>>
>>>>> And then we wouldn't get a SEGV ...
>>>>>
>>>>> I couldn't find the destructor for the WatchThread, but it seems easy
>>>>> to fix that as well.
>>>>>
>>>>> I'm probably missing something, but I find it a bit annoying that code
>>>>> that should belong to the *Thread:ing system leaks into the
>>>>> implementations of *Thread::run().
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>>
>>>>>> ** Arguably the best solution to the "thread termination races with
>>>>>> VM termination" problem is to not let the threads terminate. The code
>>>>>> as it exists today can still have JavaThreads destroying themselves
>>>>>> at the same that the VM is terminating and potentially hit the same
>>>>>> errors that require us to not allow the VMThread (and now
>>>>>> WatcherThread) to delete themselves.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks,
>>>>>>> StefanK
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> ------
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ",
>>>>>>>>> pthread
>>>>>>>>> id: " UINTX_FORMAT ").",
>>>>>>>>>    os::current_thread_id(), (uintx) pthread_self());
>>>>>>>>>
>>>>>>>>>  return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> And get rid of the explicit calls to
>>>>>>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> StefanK
>>>>>>>>>
>>>>>>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>>>>>>> This needs attention from GC and runtime folk please.
>>>>>>>>>>
>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>>>>>>>
>>>>>>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always
>>>>>>>>>> called
>>>>>>>>>> before a thread terminates.
>>>>>>>>>>
>>>>>>>>>> Background:
>>>>>>>>>>
>>>>>>>>>> Most system-related threads do not expect to explicitly
>>>>>>>>>> terminate,
>>>>>>>>>> except sometimes as part of VM termination. Such threads don't
>>>>>>>>>> have
>>>>>>>>>> their destructors called, but should.
>>>>>>>>>>
>>>>>>>>>> This omission came to light due to the ThreadLocalStorage
>>>>>>>>>> changes in
>>>>>>>>>> JDK-8132510. As part of that change we deleted the following
>>>>>>>>>> from the
>>>>>>>>>> termination path of the VMThread:
>>>>>>>>>>
>>>>>>>>>> // Thread destructor usually does this.
>>>>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>>>>
>>>>>>>>>> The clearing of TLS seemed irrelevant to the VMThread as it
>>>>>>>>>> primarily
>>>>>>>>>> is used to aid in JNI attach/detach. However Brian Gardner
>>>>>>>>>> reported:
>>>>>>>>>>
>>>>>>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> a problem on FreeBSD caused by this change and the interaction
>>>>>>>>>> with
>>>>>>>>>> the POSIX  pthread TLS destructor use introduced by JDK-8033696.
>>>>>>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>>>>>>> TLS-destructor was called it got into a loop which ran four
>>>>>>>>>> times (as
>>>>>>>>>> happens on Linux) and then prints a warning to the console (which
>>>>>>>>>> doesn't happen on Linux).
>>>>>>>>>>
>>>>>>>>>> This indicates we need to restore the:
>>>>>>>>>>
>>>>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>>>>
>>>>>>>>>> but on further consideration it seems to me that this is not
>>>>>>>>>> confined
>>>>>>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>>>>>>> invoke the Thread destructor as a thread terminates.
>>>>>>>>>>
>>>>>>>>>> Solution:
>>>>>>>>>>
>>>>>>>>>> Further investigation shows that calling the Thread destructor
>>>>>>>>>> in the
>>>>>>>>>> thread as it terminates is not possible:
>>>>>>>>>>
>>>>>>>>>> - VMThread
>>>>>>>>>>
>>>>>>>>>> This is actually destroyed by the thread that terminates the VM,
>>>>>>>>>> but
>>>>>>>>>> that can happen after it terminates and so we still hit the TLS
>>>>>>>>>> problem. The VMThread may be able to destroy itself today but in
>>>>>>>>>> the
>>>>>>>>>> past this was not possible (see existing code comment), and in
>>>>>>>>>> the
>>>>>>>>>> future it may also not be possible - the problem is that the
>>>>>>>>>> Thread
>>>>>>>>>> destructor can interact with other VM subsystems that are
>>>>>>>>>> concurrently
>>>>>>>>>> being torn down by the thread that is terminating the VM. In the
>>>>>>>>>> past
>>>>>>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>>>>>>> fragile we stick with the current scheme but restore the
>>>>>>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access
>>>>>>>>>> "this" at
>>>>>>>>>> that time because it may already have been de-allocated.
>>>>>>>>>>
>>>>>>>>>> - WatcherThread
>>>>>>>>>>
>>>>>>>>>> The WatcherThread is never destroyed today but has the same
>>>>>>>>>> problem as
>>>>>>>>>> the VMThread. We can call the destructor from the VM termination
>>>>>>>>>> thread (and have implemented that), but not from the
>>>>>>>>>> WatcherThread
>>>>>>>>>> itself. So again we just have to restore the
>>>>>>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS
>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> - GC Threads
>>>>>>>>>>
>>>>>>>>>> There are two cases:
>>>>>>>>>>
>>>>>>>>>> a) GC threads that never terminate
>>>>>>>>>>
>>>>>>>>>> For these we don't need to do anything: we can't delete the
>>>>>>>>>> thread as
>>>>>>>>>> it never terminates and we don't hit the TLS problem because it
>>>>>>>>>> never
>>>>>>>>>> terminates. So all we will do here is add some logic to check (in
>>>>>>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>>>>>>>
>>>>>>>>>> b) GC threads that can terminate
>>>>>>>>>>
>>>>>>>>>> Despite the fact the threads can terminate, references to those
>>>>>>>>>> threads are stored elsewhere (WorkGangs and other places) and
>>>>>>>>>> are not
>>>>>>>>>> cleared as part of the termination process. Those references
>>>>>>>>>> can be
>>>>>>>>>> touched after the thread has terminated so we can not call the
>>>>>>>>>> destructor at all. So again all we can do (without some major
>>>>>>>>>> thread
>>>>>>>>>> management reworking) is ensure that
>>>>>>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>>>>>>> actually terminates
>>>>>>>>>>
>>>>>>>>>> Testing: JPRT
>>>>>>>>>>         RBT - runtime nightly tests
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>
>


More information about the hotspot-dev mailing list