RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Fri May 6 23:50:28 UTC 2016

On 7/05/2016 5:40 AM, Brian Gardner wrote:
> I agree with Stefan.  When I initially ran into the problem I came up
> with the following changeset that solves my problem, by calling
> clear_thread_current at the end of java_start.
>
> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/

As I said I can see the appeal in doing this, but there is still a race 
for the threads destroyed at VM shutdown as current_or_null() can return 
non-NULL but the delete can happen before we call 
thread->clear_current_thread().

You would also want Thread::current_or_null_safe() I think, due to 
already deleted JavaThreads.

And this all side-steps the real issue in my opinion that with clean 
thread lifecycle management we should always be able to delete the 
terminating thread.

So to me it is a question of selecting which set of bandaids you want to 
apply.

Thanks,
David

> Brian
>
>> On May 6, 2016, at 8:41 AM, Stefan Karlsson
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>>
>> On 06/05/16 16:32, David Holmes wrote:
>>> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>>>> Hi David,
>>>>
>>>> On 06/05/16 15:38, David Holmes wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> Thanks for taking a look at this.
>>>>>
>>>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> I looked through the GC part of this webrev and I think the change is
>>>>>> fine.
>>>>>>
>>>>>> However, it seems a bit error prone. If we decide to change the
>>>>>> code to,
>>>>>> for example, terminate the  AbstractGangWorker threads, then we
>>>>>> have to
>>>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>>>>
>>>>> That's why I added the ShouldNotReachHere()'s - if those threads start
>>>>> terminating then we will see those hit. Perhaps a comment:
>>>>>
>>>>> ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup
>>>>>
>>>>> ?
>>>>
>>>> Yes, I would appreciate a comment. Though, when we add new threads, we
>>>> need to remember to add the set_thread(NULL) call.
>>>
>>> Well no, what you would do is manage your new threads in such a way
>>> that their run() method can do "delete this" as the last call. Only
>>> if you can't do that do you need to think about what termination
>>> logic is missing that needs to be done in lieu of the destructor.
>>
>> Yes, but this forces every implementer of a Thread:run() function to
>> have to think about these kind of requirements.
>>
>>>
>>>>>
>>>>>> Could we instead add a call to
>>>>>> ThreadLocalStorage::set_thread(NULL), or
>>>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>>>>
>>>>>> static void *java_start(Thread *thread) {
>>>>>> [...]
>>>>>>  thread->initialize_thread_current();
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>  // call one more level start routine
>>>>>>  thread->run();
>>>>>>
>>>>>>  ////////// Could we call Thread::clear_thread_current(); here?
>>>>>
>>>>> Not easily. For JavaThreads we've already done "delete this" inside
>>>>> the run() method, so we'd have to move that into java_start as well,
>>>>> but we can only do the delete for JavaThreads not for other threads.
>>>>> And we'd also have to change the VMThread and WatcherThread
>>>>> termination logic because of the deletes that happen in the
>>>>> termination thread - the "this" pointer (thread above) may no longer
>>>>> be valid when we want to call clear_current_thread() - which is why we
>>>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>>>>
>>>>> I agree it would be a lot cleaner to have java_start do:
>>>>>
>>>>>  thread->common_initialization();
>>>>>  thread->run();
>>>>>  thread->common_cleanup();
>>>>>  delete thread;
>>>>>
>>>>> for all threads, but we'd need a lot of other changes to allow for
>>>>> that. Otherwise we would need to note that kind of thread before
>>>>> calling run() then switch on the thread type after run() to decide
>>>>> what kind of cleanup is necessary and possible. I don't think that
>>>>> would be better than just doing the "right" cleanup at the end of the
>>>>> run() methods.
>>>>
>>>> I understand that this is a bit messy, and I won't insist that we change
>>>> this in this RFR, but without looking at this in much detail it sounds
>>>> weird to delete the thread in run(). Couldn't this be solved by
>>>> introducing a virtual Thread::post_run() function and do:
>>>>
>>>> virtual void Thread::post_run() {
>>>>  clear_thread_current();
>>>> }
>>>>
>>>> virtual void JavaThread::post_run() {
>>>>  Thread::post_run();
>>>>  delete this;
>>>> }
>>>
>>> But again this can't work for the VMThread or WatcherThread as they
>>> are deleted from the termination thread and so thread->post_run() may
>>> SEGV.** Plus it is only after the fact that you realize not to put
>>> "delete this" in Thread::post_run().
>>
>> OK, I didn't understand what you meant with "termination thread", but
>> I now see the call to VMThread::destroy().
>>
>> With that said, I find it odd that VMThread::destroy() deletes the VM
>> thread. We already handshake between the VMThread and the "termination
>> thread", so why isn't that VMThread::post_run() implemented as:
>>
>> virtual void VMThread::post_run() {
>>  // signal other threads that VM process is gone
>>  {
>>    // Note: we must have the _no_safepoint_check_flag. Mutex::lock()
>> allows
>>    // VM thread to enter any lock at Safepoint as long as its _owner
>> is NULL.
>>    // If that happens after _terminate_lock->wait() has unset _owner
>>    // but before it actually drops the lock and waits, the
>> notification below
>>    // may get lost and we will have a hang. To avoid this, we need to use
>>    // Mutex::lock_without_safepoint_check().
>>    MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
>>    _terminated = true;
>>    _terminate_lock->notify();
>>  }
>>
>>  Thread::post_run();
>>  delete this;
>> }
>>
>> And then we wouldn't get a SEGV ...
>>
>> I couldn't find the destructor for the WatchThread, but it seems easy
>> to fix that as well.
>>
>> I'm probably missing something, but I find it a bit annoying that code
>> that should belong to the *Thread:ing system leaks into the
>> implementations of *Thread::run().
>>
>> Thanks,
>> StefanK
>>>
>>> ** Arguably the best solution to the "thread termination races with
>>> VM termination" problem is to not let the threads terminate. The code
>>> as it exists today can still have JavaThreads destroying themselves
>>> at the same that the VM is terminating and potentially hit the same
>>> errors that require us to not allow the VMThread (and now
>>> WatcherThread) to delete themselves.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> ------
>>>>>
>>>>>>
>>>>>>  log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread
>>>>>> id: " UINTX_FORMAT ").",
>>>>>>    os::current_thread_id(), (uintx) pthread_self());
>>>>>>
>>>>>>  return 0;
>>>>>> }
>>>>>>
>>>>>> And get rid of the explicit calls to
>>>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>>
>>>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>>>> This needs attention from GC and runtime folk please.
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>>>>
>>>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called
>>>>>>> before a thread terminates.
>>>>>>>
>>>>>>> Background:
>>>>>>>
>>>>>>> Most system-related threads do not expect to explicitly terminate,
>>>>>>> except sometimes as part of VM termination. Such threads don't have
>>>>>>> their destructors called, but should.
>>>>>>>
>>>>>>> This omission came to light due to the ThreadLocalStorage changes in
>>>>>>> JDK-8132510. As part of that change we deleted the following from the
>>>>>>> termination path of the VMThread:
>>>>>>>
>>>>>>> // Thread destructor usually does this.
>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>
>>>>>>> The clearing of TLS seemed irrelevant to the VMThread as it primarily
>>>>>>> is used to aid in JNI attach/detach. However Brian Gardner reported:
>>>>>>>
>>>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> a problem on FreeBSD caused by this change and the interaction with
>>>>>>> the POSIX  pthread TLS destructor use introduced by JDK-8033696.
>>>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>>>> TLS-destructor was called it got into a loop which ran four times (as
>>>>>>> happens on Linux) and then prints a warning to the console (which
>>>>>>> doesn't happen on Linux).
>>>>>>>
>>>>>>> This indicates we need to restore the:
>>>>>>>
>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>
>>>>>>> but on further consideration it seems to me that this is not confined
>>>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>>>> invoke the Thread destructor as a thread terminates.
>>>>>>>
>>>>>>> Solution:
>>>>>>>
>>>>>>> Further investigation shows that calling the Thread destructor in the
>>>>>>> thread as it terminates is not possible:
>>>>>>>
>>>>>>> - VMThread
>>>>>>>
>>>>>>> This is actually destroyed by the thread that terminates the VM, but
>>>>>>> that can happen after it terminates and so we still hit the TLS
>>>>>>> problem. The VMThread may be able to destroy itself today but in the
>>>>>>> past this was not possible (see existing code comment), and in the
>>>>>>> future it may also not be possible - the problem is that the Thread
>>>>>>> destructor can interact with other VM subsystems that are
>>>>>>> concurrently
>>>>>>> being torn down by the thread that is terminating the VM. In the past
>>>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>>>> fragile we stick with the current scheme but restore the
>>>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access
>>>>>>> "this" at
>>>>>>> that time because it may already have been de-allocated.
>>>>>>>
>>>>>>> - WatcherThread
>>>>>>>
>>>>>>> The WatcherThread is never destroyed today but has the same
>>>>>>> problem as
>>>>>>> the VMThread. We can call the destructor from the VM termination
>>>>>>> thread (and have implemented that), but not from the WatcherThread
>>>>>>> itself. So again we just have to restore the
>>>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS
>>>>>>> problem.
>>>>>>>
>>>>>>> - GC Threads
>>>>>>>
>>>>>>> There are two cases:
>>>>>>>
>>>>>>> a) GC threads that never terminate
>>>>>>>
>>>>>>> For these we don't need to do anything: we can't delete the thread as
>>>>>>> it never terminates and we don't hit the TLS problem because it never
>>>>>>> terminates. So all we will do here is add some logic to check (in
>>>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>>>>
>>>>>>> b) GC threads that can terminate
>>>>>>>
>>>>>>> Despite the fact the threads can terminate, references to those
>>>>>>> threads are stored elsewhere (WorkGangs and other places) and are not
>>>>>>> cleared as part of the termination process. Those references can be
>>>>>>> touched after the thread has terminated so we can not call the
>>>>>>> destructor at all. So again all we can do (without some major thread
>>>>>>> management reworking) is ensure that
>>>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>>>> actually terminates
>>>>>>>
>>>>>>> Testing: JPRT
>>>>>>>         RBT - runtime nightly tests
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>