RFR(xxs): 8213236: A partial removed/deleted JavaThread cannot transition
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Nov 1 18:55:32 UTC 2018
On 11/1/18 12:49 PM, Robbin Ehn wrote:
> Hi Dan,
>
> On 01/11/2018 16:58, Daniel D. Daugherty wrote:
>> On 11/1/18 7:57 AM, Robbin Ehn wrote:
>>> Hi all please review,
>>>
>>> In 8212933 we moved the cancellation of a handshake to the VM
>>> thread, for sanity
>>> an assert was added to make sure the JavaThread do not cancel.
>>> (technically this
>>> works also, but to keep a simple model it should be done by the VM
>>> thread)
>>>
>>> A JavaThread that is terminated but not yet free should not follow
>>> the safepoint
>>> protocol. Since the VM thread running the safepoint cannot see such
>>> a thread
>>> it's confusing and error-prone doing safepoint checks and
>>> transitions from a
>>> thread not 'attached' to the VM.
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8213236
>>
>> I updated the bug to be a bit more clear about what the problem is.
>> Please let me know if my analysis is correct or not...
>
> Yes, that is correct, thanks!
>
>>
>>
>>>
>>> Code:
>>> diff -r 13ad2bfee99e src/hotspot/share/runtime/vmThread.cpp
>>> --- a/src/hotspot/share/runtime/vmThread.cpp Thu Nov 01 09:57:57
>>> 2018 +0100
>>> +++ b/src/hotspot/share/runtime/vmThread.cpp Thu Nov 01 12:40:47
>>> 2018 +0100
>>> @@ -319,7 +319,9 @@
>>> // Notify the VMThread that the last non-daemon JavaThread has
>>> terminated,
>>> // and wait until operation is performed.
>>> void VMThread::wait_for_vm_thread_exit() {
>>> - { MutexLocker mu(VMOperationQueue_lock);
>>> + assert(Thread::current()->is_Java_thread(), "Should be a
>>> JavaThread");
>>> + assert(((JavaThread*)Thread::current())->is_terminated(), "Should
>>> be off threadslist");
>>
>> This should be "should be terminated". is_terminated()
>> doesn't necessarily mean the JavaThread is not on the
>> global ThreadsList.
>
> If the thread ask itself or if you hold Threads_locks it is implicit
> the same.
> When we do exit we first take Threads_lock, remove our self from the
> global list, set state terminated and then unlock Threads_lock.
>
> But I'm perfectly fine with "should be terminated" instead.
I would prefer "should be terminated" because the assertion
is calling is_terminated().
This part of what I said:
> is_terminated() doesn't necessarily mean the JavaThread is
> not on the global ThreadsList.
is wrong! Sorry about that.
So here's what is_terminated() does:
src/hotspot/share/runtime/thread.inline.hpp:
inline bool JavaThread::is_terminated() const {
// Use load-acquire so that setting of _terminated by
// JavaThread::exit() is seen more quickly.
TerminatedTypes l_terminated = (TerminatedTypes)
OrderAccess::load_acquire((volatile jint *) &_terminated);
return check_is_terminated(l_terminated);
}
And here's what check_is_terminated() does:
src/hotspot/share/runtime/thread.hpp
// thread is terminated (no longer on the threads list); we compare
// against the two non-terminated values so that a freed JavaThread
// will also be considered terminated.
bool check_is_terminated(TerminatedTypes l_terminated) const {
return l_terminated != _not_terminated && l_terminated !=
_thread_exiting;
}
bool is_terminated() const;
so you are right that "is_terminated()/check_is_terminated()"
means that the JavaThread is no longer on the threads list.
Here's the code that sets the terminated value:
src/hotspot/share/runtime/thread.cpp:
ThreadService::remove_thread(p, daemon);
// Make sure that safepoint code disregard this thread. This is
needed since
// the thread might mess around with locks after this point. This
can cause it
// to do callbacks into the safepoint code. However, the safepoint
code is not aware
// of this thread since it is removed from the queue.
p->set_terminated_value();
I think that's it for that point.
>
>>
>>
>>> + { MutexLockerEx mu(VMOperationQueue_lock,
>>> Mutex::_no_safepoint_check_flag);
>>
>> Yup. I agree with this addition of Mutex::_no_safepoint_check_flag.
>>
>> However, I think there is one more:
>>
>> src/hotspot/share/runtime/thread.cpp:
>> before_exit(thread);
>>
>> thread->exit(true);
>>
>> // Stop VM thread.
>> {
>> // 4945125 The vm thread comes to a safepoint during exit.
>> // GC vm_operations can get caught at the safepoint, and the
>> // heap is unparseable if they are caught. Grab the Heap_lock
>> // to prevent this. The GC vm_operations will not be able to
>> // queue until after the vm thread is dead. After this point,
>> // we'll never emerge out of the safepoint before the VM exits.
>>
>> MutexLocker ml(Heap_lock);
>>
>> VMThread::wait_for_vm_thread_exit();
>>
>> You fixed the one in wait_for_vm_thread_exit(), but I think
>> that this: "MutexLocker ml(Heap_lock)" is also a problem.
>
> Silly me missed that one :)
> Yes, but it is used in vm op prolog meaning that VM thread takes it.
> So if we hold it we can deadlock with vm thread.
> E.g.
> vmGCOperations.cpp L290
> vm_operations_g1.cpp L214
>
> I don't understand this comment:
> "GC vm_operations can get caught at the safepoint, and the heap is
> unparseable if they are caught."
>
> I don't think we should grab Heap_lock at all and we should just
> remove it.
> If there is a problem with "unparseable heap", taking Heap_lock seems
> to the wrong solution.
I don't think you want to drob the grab of the Heap_lock from
this code path as part of this fix. That should be done separately.
Of course, you have GC folks close to you so you might be able to
resolve the issue (theoretically) quickly. However, I think that
will mean that this fix needs more testing.
Dan
>
> /Robbin
>
>>
>> Dan
>>
>>
>>
>>> _should_terminate = true;
>>> VMOperationQueue_lock->notify();
>>> }
>>>
>>> Passes t1-2.
>>>
>>> Thanks Robbin
>>
More information about the hotspot-runtime-dev
mailing list