RFR(XS) for PeriodicTask_lock cleanup (8072439)
David Holmes
david.holmes at oracle.com
Wed Feb 25 00:31:42 UTC 2015
Hi Dan,
Just one follow up, the OrderAccess::fence was not added by 7127792 but
was added as part of the embedded support code (for non-TSO systems) in
JDK 7 under:
6953477: Increase portability and flexibility of building Hotspot
https://bugs.openjdk.java.net/browse/JDK-6953477
That code also made _should_terminate volatile. There was no locking
then, the WatcherThread::stop did:
_should_terminate = true;
OrderAccess::fence(); // ensure WatcherThread sees update in main loop
Thread* watcher = watcher_thread();
if (watcher != NULL)
watcher->_SleepEvent->unpark();
and the fence was used to ensure that the write to _should_terminate was
visible to the WT after the unpark, as WT::run did:
jlong prev_time = os::javaTimeNanos();
for (;;) {
int res= _SleepEvent->park(time_to_wait);
if (res == OS_TIMEOUT || _should_terminate)
break;
Cheers,
David
On 20/02/2015 9:03 AM, Daniel D. Daugherty wrote:
> On 2/18/15 7:02 PM, David Holmes wrote:
>> Hi Dan,
>>
>> On 19/02/2015 2:28 AM, Daniel D. Daugherty wrote:
>>> David,
>>>
>>> Thanks for the fast review! Replies embedded below...
>>>
>>>
>>> On 2/17/15 5:53 PM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> A few comments, mainly on the comments :)
>>>
>>> Apropos since the changeset is mostly comments...
>>>
>>>
>>>>
>>>> On 18/02/2015 8:42 AM, Daniel D. Daugherty wrote:
>>>>> On 2/17/15 3:22 PM, Carsten Varming wrote:
>>>>>> Dear Daniel,
>>>>>>
>>>>>> Looks good to me.
>>>>>
>>>>> Thanks for the fast review.
>>>>>
>>>>>> The line: "OrderAccess::fence(); // ensure WatcherThread sees update
>>>>>> in main loop" seems unnecessary as the lock acts as a memory barrier.
>>>>>
>>>>> Yes, I keep looking at that line from the original work on
>>>>> JDK-7127792 and wonder why it's there... I'll chase that down
>>>>> with the original folks...
>>>>
>>>> Only needed when using lock-free access, so can be removed now.
>>
>> Oops - my bad!
>>
>>> Yes, that will be the likely outcome.
>>>
>>> However, the original code added by JDK-7127792 wasn't lock-free so
>>> I still want to chase down the original thinking.
>>
>> The lock-free code was here:
>>
>> void WatcherThread::run() {
>> assert(this == watcher_thread(), "just checking");
>>
>> this->record_stack_base_and_size();
>> this->initialize_thread_local_storage();
>> this->set_active_handles(JNIHandleBlock::allocate_block());
>> while(!_should_terminate) {
>>
>> and of course you still have a lock-free access in the new code:
>>
>> 1300 if (_should_terminate) {
>> 1301 // check for termination before posting the next tick
>> 1302 break;
>> 1303 }
>>
>> so the fence() could stay, but its omission would be harmless I think
>> because not seeing the update immediately is no different to having
>> the update happen just after the check - it's racy code. The only
>> other option would be to always access _should_terminate with the
>> PeriodicTask_lock held - but that is subverting the intent of the lock
>> somewhat as _should_terminate is part of the WatcherThread state.
>> Actually the more I look at this the more I think the lock has already
>> been subverted to assist with the WT termination protocol.
>
> Just to make sure we're on the same page I think Carsten was
> talking about this block (from 8072439):
>
> 1334 {
> 1335 // Follow normal safepoint aware lock enter protocol since the
> 1336 // WatcherThread is stopped by another JavaThread.
> 1337 MutexLocker ml(PeriodicTask_lock);
> 1338 _should_terminate = true;
> 1339 OrderAccess::fence(); // ensure WatcherThread sees update in
> main loop
> 1340
> 1341 WatcherThread* watcher = watcher_thread();
> 1342 if (watcher != NULL) {
> 1343 // unpark the WatcherThread so it can see that it should
> terminate
> 1344 watcher->unpark();
> 1345 }
> 1346 }
>
> where the fence() on line 1339 is not needed because when we drop
> the PeriodicTask_lock on line 1346 we get a fence() equivalent.
>
> This is the block before my fix for JDK-8047720:
>
> 1360 {
> 1361 MutexLockerEx ml(PeriodicTask_lock,
> Mutex::_no_safepoint_check_flag);
> 1362 _should_terminate = true;
> 1363 OrderAccess::fence(); // ensure WatcherThread sees update in
> main loop
> 1364
> 1365 WatcherThread* watcher = watcher_thread();
> 1366 if (watcher != NULL)
> 1367 watcher->unpark();
> 1368 }
>
> It has the same fence() call and the same lock exit (on different
> line numbers) so I'm wondering about the original thinking here.
> I still haven't heard back from Rickard...
>
>
>>
>>>
>>>> java.cpp:
>>>>
>>>> Typo: likelyhood -> likelihood
>>>
>>> Fixed.
>>>
>>>
>>>> ---
>>>>
>>>> task.cpp
>>>>
>>>> 81 int PeriodicTask::time_to_wait() {
>>>> 82 assert(Thread::current()->is_Watcher_thread(), "must be
>>>> WatcherThread");
>>>>
>>>> This is currently true but not sure it has to be true. If we are
>>>> assuming/constraining a subset of the task API to only be callable by
>>>> the WatcherThread then perhaps we should document that in task.hpp ?
>>>> And as the WatcherThread is a friend, none of those methods need to be
>>>> public - ie execute_if_pending and time_to_wait (realtime_tick is
>>>> already private).
>>>
>>> I was too focused on adding asserts that enforced how it works today
>>> and not on how it might be used down the road. There's no reason to
>>> restrict the caller of time_to_wait() to the WatcherThread. I've
>>> deleted the assert on line 82 and I added a comment to task.hpp:
>>>
>>> // Requires the PeriodicTask_lock.
>>> static int time_to_wait();
>>>
>>> By leaving time_to_wait() public, that allows code other than the
>>> WatcherThread to use it perhaps for debugging...
>>
>> Okay. I still think the API is somewhat confused about its public
>> interface - execute_if_pending should be protected for use by WT only
>> (and subclasses implement it of course).
>
> Yes, but I think I'll leave that for another bug (probably to
> be fixed by someone else).
>
>
>>
>>>
>>>> 112 void PeriodicTask::enroll() {
>>>> 113 // Follow normal safepoint aware lock enter protocol if the
>>>> caller does
>>>> 114 // not already own the PeriodicTask_lock. Otherwise, we don't
>>>> try to
>>>> 115 // enter it again to avoid deadlocking with a safepoint that
>>>> started
>>>> 116 // after the initial enter and before this enter.
>>>> 117 //
>>>> 118 MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? NULL
>>>> 119 :
>>>> PeriodicTask_lock);
>>>>
>>>> The deadlock is not with a safepoint interaction but the normal
>>>> self-deadlock that would occur trying to lock() a Monitor/Mutex that
>>>> you already own. (Monitors/Mutexes do not support recursive locking -
>>>> in contrast to ObjectMonitors.) Ditto for disenroll.
>>>
>>> I have been working on Java Monitors for waaaayyyy too long!
>>
>> :)
>>
>>> In the header comment for src/share/vm/runtime/mutex.cpp:
>>>
>>> // Native Monitors do *not* support nesting or recursion but ...
>>>
>>>
>>>> The original comment is the correct one with respect to acquiring with
>>>> a safepoint check if not already locked:
>>>>
>>>> /* XXX could be called from a JavaThread, so we have to check for
>>>> * safepoint when taking the lock to avoid deadlocking */
>>>
>>> I'm still not fond of that comment. Perhaps:
>>>
>>> // Follow normal safepoint aware lock enter protocol if the caller
>>> does
>>> // not already own the PeriodicTask_lock. Otherwise, we don't try to
>>> // enter it again because VM internal Mutexes do not support
>>> recursion.
>>>
>>> I'm trying to make it clear that we're following safepoint aware
>>> protocol
>>> even though we're using MutexLockerEx. I also wanted a clear statement
>>> about why we skipped the enter if we already owned the lock. I don't
>>> think that we could be called from a JavaThread to be the important
>>> part.
>>
>> Being a JavaThread is the only reason the safepoint protocol has to be
>> considered - for non-JavaThreads we won't look at the safepoint state.
>>
>> The enroll() and disenroll() methods are called from the JavaThreads
>> and they can not be considered "leaf" methods, so they must obey the
>> safepoint protocol wrt the lock.
>>
>> Anyway your comment is fine.
>
> Thanks.
>
>
>>
>>>
>>>> ---
>>>>
>>>> thread.cpp:
>>>>
>>>> 1198 int WatcherThread::sleep() const {
>>>> 1199 // The WatcherThread does not honor the safepoint protocol for
>>>> 1200 // PeriodicTask_lock in order to provide more consistent task
>>>> 1201 // execution timing.
>>>>
>>>> Not sure that is really the correct characterisation. The WT doesn't
>>>> need to honor the safepoint protocol because it isn't a JavaThread and
>>>> so won't block even if a safepoint is in progress. So checking for a
>>>> safepoint is simply a waste of time.
>>>
>>> Much better way of putting it. Thanks! How about:
>>>
>>> // The WatcherThread is not a JavaThread so we do not honor the
>>> // safepoint protocol for the PeriodicTask_lock.
>>
>> Ok.
>
> Thanks.
>
>
>>
>>>
>>>>
>>>> 3565 MutexLocker ml(PeriodicTask_lock);
>>>>
>>>> Why did you allow for a safepoint check here? It would only be
>>>> necessary if a JavaThread might concurrently acquire the
>>>> PeriodicTask_lock and a safepoint is initiated - during VM
>>>> initialization. Not completely impossible I guess but have never heard
>>>> of this occurring, so would require some other startup change to
>>>> trigger it.
>>>
>>> There's no good reason to use MutexLockerEx here and I didn't
>>> want to write a comment explaining why we were using one.
>>> Earlier code in Threads::create_vm() uses MutexLocker instead
>>> of MutexLockerEx.
>>
>> Okay. Change is harmless.
>
> Cool. I think we're on the same page.
>
> Thanks again for the reviews.
>
> Dan
>>
>> Cheers,
>> David
>>
>>>
>>> Thanks again for the very thorough review!
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Carsten
>>>>>>
>>>>>> On Tue, Feb 17, 2015 at 4:44 PM, Daniel D. Daugherty
>>>>>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>>
>>>>>> wrote:
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> My fix for the following bug:
>>>>>>
>>>>>> JDK-8047720 Xprof hangs on Solaris
>>>>>>
>>>>>> that was pushed to JDK9 last June needs to be cleaned up.
>>>>>>
>>>>>> Thanks to Alex Garthwaite (agarthwaite at twitter.com
>>>>>> <mailto:agarthwaite at twitter.com>) and Carsten
>>>>>> Varming (varming at gmail.com <mailto:varming at gmail.com>) for
>>>>>> reporting the mess that I made
>>>>>> in WatcherThread::stop() and for suggesting fixes.
>>>>>>
>>>>>> This code review is for a general cleanup pass on
>>>>>> PeriodicTask_lock
>>>>>> and some of the surrounding code. This is a targeted review in
>>>>>> that
>>>>>> I would like to hear from three groups of people:
>>>>>>
>>>>>> 1) The author and reviewers for:
>>>>>>
>>>>>> JDK-7127792 Add the ability to change an existing
>>>>>> PeriodicTask's
>>>>>> execution interval
>>>>>>
>>>>>> Rickard, David H, and Markus G.
>>>>>>
>>>>>> 2) The reviewers for:
>>>>>>
>>>>>> JDK-8047720 Xprof hangs on Solaris
>>>>>>
>>>>>> Markus G and Coleen
>>>>>>
>>>>>> 3) Alex and Carsten
>>>>>>
>>>>>>
>>>>>> Here's the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8072439-webrev/0-for_jdk9_hs_rt/
>>>>>> <http://cr.openjdk.java.net/%7Edcubed/8072439-webrev/0-for_jdk9_hs_rt/>
>>>>>>
>>>>>>
>>>>>> I've attached the original RFR for JDK-8047720 that explains
>>>>>> the original deadlock that was being fixed. Similar testing
>>>>>> will be done with this fix.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list