RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads
Stefan Karlsson
stefan.karlsson at oracle.com
Mon May 9 09:29:58 UTC 2016
Hi,
For java_start, would it be enough to just, unconditionally, call
ThreadLocalStorage::set_thread(NULL) after thread->run() ?
StefanK
On 2016-05-07 07:45, David Holmes wrote:
> Brian, Stefan,
>
> I've been giving this more thought and I think you are both right.
> While this can't be completely handled in java_start for all threads
> it can handle the most common cases. If I don't add the deletion of
> the WatcherThread then the problematic case reduces to that of the
> VMThread.
>
> I'll do a new version on Monday. I also need to double-check those
> threads that don't use java_start (which I am sorely tempted to rename).
>
> Thanks,
> David
>
> On 7/05/2016 9:50 AM, David Holmes wrote:
>> On 7/05/2016 5:40 AM, Brian Gardner wrote:
>>> I agree with Stefan. When I initially ran into the problem I came up
>>> with the following changeset that solves my problem, by calling
>>> clear_thread_current at the end of java_start.
>>>
>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
>>>
>>>
>>
>> As I said I can see the appeal in doing this, but there is still a race
>> for the threads destroyed at VM shutdown as current_or_null() can return
>> non-NULL but the delete can happen before we call
>> thread->clear_current_thread().
>>
>> You would also want Thread::current_or_null_safe() I think, due to
>> already deleted JavaThreads.
>>
>> And this all side-steps the real issue in my opinion that with clean
>> thread lifecycle management we should always be able to delete the
>> terminating thread.
>>
>> So to me it is a question of selecting which set of bandaids you want to
>> apply.
>>
>> Thanks,
>> David
>>
>>> Brian
>>>
>>>> On May 6, 2016, at 8:41 AM, Stefan Karlsson
>>>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>>> wrote:
>>>>
>>>> On 06/05/16 16:32, David Holmes wrote:
>>>>> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 06/05/16 15:38, David Holmes wrote:
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> Thanks for taking a look at this.
>>>>>>>
>>>>>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> I looked through the GC part of this webrev and I think the
>>>>>>>> change is
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> However, it seems a bit error prone. If we decide to change the
>>>>>>>> code to,
>>>>>>>> for example, terminate the AbstractGangWorker threads, then we
>>>>>>>> have to
>>>>>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>>>>>>
>>>>>>> That's why I added the ShouldNotReachHere()'s - if those threads
>>>>>>> start
>>>>>>> terminating then we will see those hit. Perhaps a comment:
>>>>>>>
>>>>>>> ShouldNotReachHere(); // If thread terminates we have to do TLS
>>>>>>> cleanup
>>>>>>>
>>>>>>> ?
>>>>>>
>>>>>> Yes, I would appreciate a comment. Though, when we add new
>>>>>> threads, we
>>>>>> need to remember to add the set_thread(NULL) call.
>>>>>
>>>>> Well no, what you would do is manage your new threads in such a way
>>>>> that their run() method can do "delete this" as the last call. Only
>>>>> if you can't do that do you need to think about what termination
>>>>> logic is missing that needs to be done in lieu of the destructor.
>>>>
>>>> Yes, but this forces every implementer of a Thread:run() function to
>>>> have to think about these kind of requirements.
>>>>
>>>>>
>>>>>>>
>>>>>>>> Could we instead add a call to
>>>>>>>> ThreadLocalStorage::set_thread(NULL), or
>>>>>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>>>>>>
>>>>>>>> static void *java_start(Thread *thread) {
>>>>>>>> [...]
>>>>>>>> thread->initialize_thread_current();
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> // call one more level start routine
>>>>>>>> thread->run();
>>>>>>>>
>>>>>>>> ////////// Could we call Thread::clear_thread_current(); here?
>>>>>>>
>>>>>>> Not easily. For JavaThreads we've already done "delete this" inside
>>>>>>> the run() method, so we'd have to move that into java_start as
>>>>>>> well,
>>>>>>> but we can only do the delete for JavaThreads not for other
>>>>>>> threads.
>>>>>>> And we'd also have to change the VMThread and WatcherThread
>>>>>>> termination logic because of the deletes that happen in the
>>>>>>> termination thread - the "this" pointer (thread above) may no
>>>>>>> longer
>>>>>>> be valid when we want to call clear_current_thread() - which is
>>>>>>> why we
>>>>>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>>>>>>
>>>>>>> I agree it would be a lot cleaner to have java_start do:
>>>>>>>
>>>>>>> thread->common_initialization();
>>>>>>> thread->run();
>>>>>>> thread->common_cleanup();
>>>>>>> delete thread;
>>>>>>>
>>>>>>> for all threads, but we'd need a lot of other changes to allow for
>>>>>>> that. Otherwise we would need to note that kind of thread before
>>>>>>> calling run() then switch on the thread type after run() to decide
>>>>>>> what kind of cleanup is necessary and possible. I don't think that
>>>>>>> would be better than just doing the "right" cleanup at the end
>>>>>>> of the
>>>>>>> run() methods.
>>>>>>
>>>>>> I understand that this is a bit messy, and I won't insist that we
>>>>>> change
>>>>>> this in this RFR, but without looking at this in much detail it
>>>>>> sounds
>>>>>> weird to delete the thread in run(). Couldn't this be solved by
>>>>>> introducing a virtual Thread::post_run() function and do:
>>>>>>
>>>>>> virtual void Thread::post_run() {
>>>>>> clear_thread_current();
>>>>>> }
>>>>>>
>>>>>> virtual void JavaThread::post_run() {
>>>>>> Thread::post_run();
>>>>>> delete this;
>>>>>> }
>>>>>
>>>>> But again this can't work for the VMThread or WatcherThread as they
>>>>> are deleted from the termination thread and so thread->post_run() may
>>>>> SEGV.** Plus it is only after the fact that you realize not to put
>>>>> "delete this" in Thread::post_run().
>>>>
>>>> OK, I didn't understand what you meant with "termination thread", but
>>>> I now see the call to VMThread::destroy().
>>>>
>>>> With that said, I find it odd that VMThread::destroy() deletes the VM
>>>> thread. We already handshake between the VMThread and the "termination
>>>> thread", so why isn't that VMThread::post_run() implemented as:
>>>>
>>>> virtual void VMThread::post_run() {
>>>> // signal other threads that VM process is gone
>>>> {
>>>> // Note: we must have the _no_safepoint_check_flag. Mutex::lock()
>>>> allows
>>>> // VM thread to enter any lock at Safepoint as long as its _owner
>>>> is NULL.
>>>> // If that happens after _terminate_lock->wait() has unset _owner
>>>> // but before it actually drops the lock and waits, the
>>>> notification below
>>>> // may get lost and we will have a hang. To avoid this, we need to
>>>> use
>>>> // Mutex::lock_without_safepoint_check().
>>>> MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
>>>> _terminated = true;
>>>> _terminate_lock->notify();
>>>> }
>>>>
>>>> Thread::post_run();
>>>> delete this;
>>>> }
>>>>
>>>> And then we wouldn't get a SEGV ...
>>>>
>>>> I couldn't find the destructor for the WatchThread, but it seems easy
>>>> to fix that as well.
>>>>
>>>> I'm probably missing something, but I find it a bit annoying that code
>>>> that should belong to the *Thread:ing system leaks into the
>>>> implementations of *Thread::run().
>>>>
>>>> Thanks,
>>>> StefanK
>>>>>
>>>>> ** Arguably the best solution to the "thread termination races with
>>>>> VM termination" problem is to not let the threads terminate. The code
>>>>> as it exists today can still have JavaThreads destroying themselves
>>>>> at the same that the VM is terminating and potentially hit the same
>>>>> errors that require us to not allow the VMThread (and now
>>>>> WatcherThread) to delete themselves.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> ------
>>>>>>>
>>>>>>>>
>>>>>>>> log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ",
>>>>>>>> pthread
>>>>>>>> id: " UINTX_FORMAT ").",
>>>>>>>> os::current_thread_id(), (uintx) pthread_self());
>>>>>>>>
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> And get rid of the explicit calls to
>>>>>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> StefanK
>>>>>>>>
>>>>>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>>>>>> This needs attention from GC and runtime folk please.
>>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>>>>>>
>>>>>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always
>>>>>>>>> called
>>>>>>>>> before a thread terminates.
>>>>>>>>>
>>>>>>>>> Background:
>>>>>>>>>
>>>>>>>>> Most system-related threads do not expect to explicitly
>>>>>>>>> terminate,
>>>>>>>>> except sometimes as part of VM termination. Such threads don't
>>>>>>>>> have
>>>>>>>>> their destructors called, but should.
>>>>>>>>>
>>>>>>>>> This omission came to light due to the ThreadLocalStorage
>>>>>>>>> changes in
>>>>>>>>> JDK-8132510. As part of that change we deleted the following
>>>>>>>>> from the
>>>>>>>>> termination path of the VMThread:
>>>>>>>>>
>>>>>>>>> // Thread destructor usually does this.
>>>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>>>
>>>>>>>>> The clearing of TLS seemed irrelevant to the VMThread as it
>>>>>>>>> primarily
>>>>>>>>> is used to aid in JNI attach/detach. However Brian Gardner
>>>>>>>>> reported:
>>>>>>>>>
>>>>>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> a problem on FreeBSD caused by this change and the interaction
>>>>>>>>> with
>>>>>>>>> the POSIX pthread TLS destructor use introduced by JDK-8033696.
>>>>>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>>>>>> TLS-destructor was called it got into a loop which ran four
>>>>>>>>> times (as
>>>>>>>>> happens on Linux) and then prints a warning to the console (which
>>>>>>>>> doesn't happen on Linux).
>>>>>>>>>
>>>>>>>>> This indicates we need to restore the:
>>>>>>>>>
>>>>>>>>> ThreadLocalStorage::set_thread(NULL);
>>>>>>>>>
>>>>>>>>> but on further consideration it seems to me that this is not
>>>>>>>>> confined
>>>>>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>>>>>> invoke the Thread destructor as a thread terminates.
>>>>>>>>>
>>>>>>>>> Solution:
>>>>>>>>>
>>>>>>>>> Further investigation shows that calling the Thread destructor
>>>>>>>>> in the
>>>>>>>>> thread as it terminates is not possible:
>>>>>>>>>
>>>>>>>>> - VMThread
>>>>>>>>>
>>>>>>>>> This is actually destroyed by the thread that terminates the VM,
>>>>>>>>> but
>>>>>>>>> that can happen after it terminates and so we still hit the TLS
>>>>>>>>> problem. The VMThread may be able to destroy itself today but in
>>>>>>>>> the
>>>>>>>>> past this was not possible (see existing code comment), and in
>>>>>>>>> the
>>>>>>>>> future it may also not be possible - the problem is that the
>>>>>>>>> Thread
>>>>>>>>> destructor can interact with other VM subsystems that are
>>>>>>>>> concurrently
>>>>>>>>> being torn down by the thread that is terminating the VM. In the
>>>>>>>>> past
>>>>>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>>>>>> fragile we stick with the current scheme but restore the
>>>>>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access
>>>>>>>>> "this" at
>>>>>>>>> that time because it may already have been de-allocated.
>>>>>>>>>
>>>>>>>>> - WatcherThread
>>>>>>>>>
>>>>>>>>> The WatcherThread is never destroyed today but has the same
>>>>>>>>> problem as
>>>>>>>>> the VMThread. We can call the destructor from the VM termination
>>>>>>>>> thread (and have implemented that), but not from the
>>>>>>>>> WatcherThread
>>>>>>>>> itself. So again we just have to restore the
>>>>>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>> - GC Threads
>>>>>>>>>
>>>>>>>>> There are two cases:
>>>>>>>>>
>>>>>>>>> a) GC threads that never terminate
>>>>>>>>>
>>>>>>>>> For these we don't need to do anything: we can't delete the
>>>>>>>>> thread as
>>>>>>>>> it never terminates and we don't hit the TLS problem because it
>>>>>>>>> never
>>>>>>>>> terminates. So all we will do here is add some logic to check (in
>>>>>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>>>>>>
>>>>>>>>> b) GC threads that can terminate
>>>>>>>>>
>>>>>>>>> Despite the fact the threads can terminate, references to those
>>>>>>>>> threads are stored elsewhere (WorkGangs and other places) and
>>>>>>>>> are not
>>>>>>>>> cleared as part of the termination process. Those references
>>>>>>>>> can be
>>>>>>>>> touched after the thread has terminated so we can not call the
>>>>>>>>> destructor at all. So again all we can do (without some major
>>>>>>>>> thread
>>>>>>>>> management reworking) is ensure that
>>>>>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>>>>>> actually terminates
>>>>>>>>>
>>>>>>>>> Testing: JPRT
>>>>>>>>> RBT - runtime nightly tests
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>
More information about the hotspot-dev
mailing list