RFR(XS) for PeriodicTask_lock cleanup (8072439)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Feb 19 23:03:22 UTC 2015
On 2/18/15 7:02 PM, David Holmes wrote:
> Hi Dan,
>
> On 19/02/2015 2:28 AM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the fast review! Replies embedded below...
>>
>>
>> On 2/17/15 5:53 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> A few comments, mainly on the comments :)
>>
>> Apropos since the changeset is mostly comments...
>>
>>
>>>
>>> On 18/02/2015 8:42 AM, Daniel D. Daugherty wrote:
>>>> On 2/17/15 3:22 PM, Carsten Varming wrote:
>>>>> Dear Daniel,
>>>>>
>>>>> Looks good to me.
>>>>
>>>> Thanks for the fast review.
>>>>
>>>>> The line: "OrderAccess::fence(); // ensure WatcherThread sees update
>>>>> in main loop" seems unnecessary as the lock acts as a memory barrier.
>>>>
>>>> Yes, I keep looking at that line from the original work on
>>>> JDK-7127792 and wonder why it's there... I'll chase that down
>>>> with the original folks...
>>>
>>> Only needed when using lock-free access, so can be removed now.
>
> Oops - my bad!
>
>> Yes, that will be the likely outcome.
>>
>> However, the original code added by JDK-7127792 wasn't lock-free so
>> I still want to chase down the original thinking.
>
> The lock-free code was here:
>
> void WatcherThread::run() {
> assert(this == watcher_thread(), "just checking");
>
> this->record_stack_base_and_size();
> this->initialize_thread_local_storage();
> this->set_active_handles(JNIHandleBlock::allocate_block());
> while(!_should_terminate) {
>
> and of course you still have a lock-free access in the new code:
>
> 1300 if (_should_terminate) {
> 1301 // check for termination before posting the next tick
> 1302 break;
> 1303 }
>
> so the fence() could stay, but its omission would be harmless I think
> because not seeing the update immediately is no different to having
> the update happen just after the check - it's racy code. The only
> other option would be to always access _should_terminate with the
> PeriodicTask_lock held - but that is subverting the intent of the lock
> somewhat as _should_terminate is part of the WatcherThread state.
> Actually the more I look at this the more I think the lock has already
> been subverted to assist with the WT termination protocol.
Just to make sure we're on the same page I think Carsten was
talking about this block (from 8072439):
1334 {
1335 // Follow normal safepoint aware lock enter protocol since the
1336 // WatcherThread is stopped by another JavaThread.
1337 MutexLocker ml(PeriodicTask_lock);
1338 _should_terminate = true;
1339 OrderAccess::fence(); // ensure WatcherThread sees update in
main loop
1340
1341 WatcherThread* watcher = watcher_thread();
1342 if (watcher != NULL) {
1343 // unpark the WatcherThread so it can see that it should
terminate
1344 watcher->unpark();
1345 }
1346 }
where the fence() on line 1339 is not needed because when we drop
the PeriodicTask_lock on line 1346 we get a fence() equivalent.
This is the block before my fix for JDK-8047720:
1360 {
1361 MutexLockerEx ml(PeriodicTask_lock,
Mutex::_no_safepoint_check_flag);
1362 _should_terminate = true;
1363 OrderAccess::fence(); // ensure WatcherThread sees update in
main loop
1364
1365 WatcherThread* watcher = watcher_thread();
1366 if (watcher != NULL)
1367 watcher->unpark();
1368 }
It has the same fence() call and the same lock exit (on different
line numbers) so I'm wondering about the original thinking here.
I still haven't heard back from Rickard...
>
>>
>>> java.cpp:
>>>
>>> Typo: likelyhood -> likelihood
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> task.cpp
>>>
>>> 81 int PeriodicTask::time_to_wait() {
>>> 82 assert(Thread::current()->is_Watcher_thread(), "must be
>>> WatcherThread");
>>>
>>> This is currently true but not sure it has to be true. If we are
>>> assuming/constraining a subset of the task API to only be callable by
>>> the WatcherThread then perhaps we should document that in task.hpp ?
>>> And as the WatcherThread is a friend, none of those methods need to be
>>> public - ie execute_if_pending and time_to_wait (realtime_tick is
>>> already private).
>>
>> I was too focused on adding asserts that enforced how it works today
>> and not on how it might be used down the road. There's no reason to
>> restrict the caller of time_to_wait() to the WatcherThread. I've
>> deleted the assert on line 82 and I added a comment to task.hpp:
>>
>> // Requires the PeriodicTask_lock.
>> static int time_to_wait();
>>
>> By leaving time_to_wait() public, that allows code other than the
>> WatcherThread to use it perhaps for debugging...
>
> Okay. I still think the API is somewhat confused about its public
> interface - execute_if_pending should be protected for use by WT only
> (and subclasses implement it of course).
Yes, but I think I'll leave that for another bug (probably to
be fixed by someone else).
>
>>
>>> 112 void PeriodicTask::enroll() {
>>> 113 // Follow normal safepoint aware lock enter protocol if the
>>> caller does
>>> 114 // not already own the PeriodicTask_lock. Otherwise, we don't
>>> try to
>>> 115 // enter it again to avoid deadlocking with a safepoint that
>>> started
>>> 116 // after the initial enter and before this enter.
>>> 117 //
>>> 118 MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? NULL
>>> 119 :
>>> PeriodicTask_lock);
>>>
>>> The deadlock is not with a safepoint interaction but the normal
>>> self-deadlock that would occur trying to lock() a Monitor/Mutex that
>>> you already own. (Monitors/Mutexes do not support recursive locking -
>>> in contrast to ObjectMonitors.) Ditto for disenroll.
>>
>> I have been working on Java Monitors for waaaayyyy too long!
>
> :)
>
>> In the header comment for src/share/vm/runtime/mutex.cpp:
>>
>> // Native Monitors do *not* support nesting or recursion but ...
>>
>>
>>> The original comment is the correct one with respect to acquiring with
>>> a safepoint check if not already locked:
>>>
>>> /* XXX could be called from a JavaThread, so we have to check for
>>> * safepoint when taking the lock to avoid deadlocking */
>>
>> I'm still not fond of that comment. Perhaps:
>>
>> // Follow normal safepoint aware lock enter protocol if the caller
>> does
>> // not already own the PeriodicTask_lock. Otherwise, we don't try to
>> // enter it again because VM internal Mutexes do not support
>> recursion.
>>
>> I'm trying to make it clear that we're following safepoint aware
>> protocol
>> even though we're using MutexLockerEx. I also wanted a clear statement
>> about why we skipped the enter if we already owned the lock. I don't
>> think that we could be called from a JavaThread to be the important
>> part.
>
> Being a JavaThread is the only reason the safepoint protocol has to be
> considered - for non-JavaThreads we won't look at the safepoint state.
>
> The enroll() and disenroll() methods are called from the JavaThreads
> and they can not be considered "leaf" methods, so they must obey the
> safepoint protocol wrt the lock.
>
> Anyway your comment is fine.
Thanks.
>
>>
>>> ---
>>>
>>> thread.cpp:
>>>
>>> 1198 int WatcherThread::sleep() const {
>>> 1199 // The WatcherThread does not honor the safepoint protocol for
>>> 1200 // PeriodicTask_lock in order to provide more consistent task
>>> 1201 // execution timing.
>>>
>>> Not sure that is really the correct characterisation. The WT doesn't
>>> need to honor the safepoint protocol because it isn't a JavaThread and
>>> so won't block even if a safepoint is in progress. So checking for a
>>> safepoint is simply a waste of time.
>>
>> Much better way of putting it. Thanks! How about:
>>
>> // The WatcherThread is not a JavaThread so we do not honor the
>> // safepoint protocol for the PeriodicTask_lock.
>
> Ok.
Thanks.
>
>>
>>>
>>> 3565 MutexLocker ml(PeriodicTask_lock);
>>>
>>> Why did you allow for a safepoint check here? It would only be
>>> necessary if a JavaThread might concurrently acquire the
>>> PeriodicTask_lock and a safepoint is initiated - during VM
>>> initialization. Not completely impossible I guess but have never heard
>>> of this occurring, so would require some other startup change to
>>> trigger it.
>>
>> There's no good reason to use MutexLockerEx here and I didn't
>> want to write a comment explaining why we were using one.
>> Earlier code in Threads::create_vm() uses MutexLocker instead
>> of MutexLockerEx.
>
> Okay. Change is harmless.
Cool. I think we're on the same page.
Thanks again for the reviews.
Dan
>
> Cheers,
> David
>
>>
>> Thanks again for the very thorough review!
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Carsten
>>>>>
>>>>> On Tue, Feb 17, 2015 at 4:44 PM, Daniel D. Daugherty
>>>>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>>
>>>>> wrote:
>>>>>
>>>>> Greetings,
>>>>>
>>>>> My fix for the following bug:
>>>>>
>>>>> JDK-8047720 Xprof hangs on Solaris
>>>>>
>>>>> that was pushed to JDK9 last June needs to be cleaned up.
>>>>>
>>>>> Thanks to Alex Garthwaite (agarthwaite at twitter.com
>>>>> <mailto:agarthwaite at twitter.com>) and Carsten
>>>>> Varming (varming at gmail.com <mailto:varming at gmail.com>) for
>>>>> reporting the mess that I made
>>>>> in WatcherThread::stop() and for suggesting fixes.
>>>>>
>>>>> This code review is for a general cleanup pass on
>>>>> PeriodicTask_lock
>>>>> and some of the surrounding code. This is a targeted review in
>>>>> that
>>>>> I would like to hear from three groups of people:
>>>>>
>>>>> 1) The author and reviewers for:
>>>>>
>>>>> JDK-7127792 Add the ability to change an existing
>>>>> PeriodicTask's
>>>>> execution interval
>>>>>
>>>>> Rickard, David H, and Markus G.
>>>>>
>>>>> 2) The reviewers for:
>>>>>
>>>>> JDK-8047720 Xprof hangs on Solaris
>>>>>
>>>>> Markus G and Coleen
>>>>>
>>>>> 3) Alex and Carsten
>>>>>
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8072439-webrev/0-for_jdk9_hs_rt/
>>>>> <http://cr.openjdk.java.net/%7Edcubed/8072439-webrev/0-for_jdk9_hs_rt/>
>>>>>
>>>>>
>>>>> I've attached the original RFR for JDK-8047720 that explains
>>>>> the original deadlock that was being fixed. Similar testing
>>>>> will be done with this fix.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>
>>
More information about the serviceability-dev
mailing list