RFR(XS) for PeriodicTask_lock cleanup (8072439)

Fri Feb 20 00:17:12 UTC 2015

On 20/02/2015 9:03 AM, Daniel D. Daugherty wrote:
> On 2/18/15 7:02 PM, David Holmes wrote:
>> Hi Dan,
>>
>> On 19/02/2015 2:28 AM, Daniel D. Daugherty wrote:
>>> David,
>>>
>>> Thanks for the fast review! Replies embedded below...
>>>
>>>
>>> On 2/17/15 5:53 PM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> A few comments, mainly on the comments :)
>>>
>>> Apropos since the changeset is mostly comments...
>>>
>>>
>>>>
>>>> On 18/02/2015 8:42 AM, Daniel D. Daugherty wrote:
>>>>> On 2/17/15 3:22 PM, Carsten Varming wrote:
>>>>>> Dear Daniel,
>>>>>>
>>>>>> Looks good to me.
>>>>>
>>>>> Thanks for the fast review.
>>>>>
>>>>>> The line: "OrderAccess::fence();  // ensure WatcherThread sees update
>>>>>> in main loop" seems unnecessary as the lock acts as a memory barrier.
>>>>>
>>>>> Yes, I keep looking at that line from the original work on
>>>>> JDK-7127792 and wonder why it's there... I'll chase that down
>>>>> with the original folks...
>>>>
>>>> Only needed when using lock-free access, so can be removed now.
>>
>> Oops - my bad!
>>
>>> Yes, that will be the likely outcome.
>>>
>>> However, the original code added by JDK-7127792 wasn't lock-free so
>>> I still want to chase down the original thinking.
>>
>> The lock-free code was here:
>>
>> void WatcherThread::run() {
>>   assert(this == watcher_thread(), "just checking");
>>
>>   this->record_stack_base_and_size();
>>   this->initialize_thread_local_storage();
>>   this->set_active_handles(JNIHandleBlock::allocate_block());
>>   while(!_should_terminate) {
>>
>> and of course you still have a lock-free access in the new code:
>>
>> 1300     if (_should_terminate) {
>> 1301       // check for termination before posting the next tick
>> 1302       break;
>> 1303     }
>>
>> so the fence() could stay, but its omission would be harmless I think
>> because not seeing the update immediately is no different to having
>> the update happen just after the check - it's racy code. The only
>> other option would be to always access _should_terminate with the
>> PeriodicTask_lock held - but that is subverting the intent of the lock
>> somewhat as _should_terminate is part of the WatcherThread state.
>> Actually the more I look at this the more I think the lock has already
>> been subverted to assist with the WT termination protocol.
>
> Just to make sure we're on the same page I think Carsten was
> talking about this block (from 8072439):
>
> 1334   {
> 1335     // Follow normal safepoint aware lock enter protocol since the
> 1336     // WatcherThread is stopped by another JavaThread.
> 1337     MutexLocker ml(PeriodicTask_lock);
> 1338     _should_terminate = true;
> 1339     OrderAccess::fence();  // ensure WatcherThread sees update in
> main loop
> 1340
> 1341     WatcherThread* watcher = watcher_thread();
> 1342     if (watcher != NULL) {
> 1343       // unpark the WatcherThread so it can see that it should
> terminate
> 1344       watcher->unpark();
> 1345     }
> 1346   }
>
> where the fence() on line 1339 is not needed because when we drop
> the PeriodicTask_lock on line 1346 we get a fence() equivalent.
>
> This is the block before my fix for JDK-8047720:
>
> 1360   {
> 1361     MutexLockerEx ml(PeriodicTask_lock,
> Mutex::_no_safepoint_check_flag);
> 1362     _should_terminate = true;
> 1363     OrderAccess::fence();  // ensure WatcherThread sees update in
> main loop
> 1364
> 1365     WatcherThread* watcher = watcher_thread();
> 1366     if (watcher != NULL)
> 1367       watcher->unpark();
> 1368   }
>
> It has the same fence() call and the same lock exit (on different
> line numbers) so I'm wondering about the original thinking here.
> I still haven't heard back from Rickard...

I think the fence was just a typical lock-free usage to try and make the 
write visible "immediately". I don't think it helps with any potential 
hangs due to the inherent race between the write and lock-free read.

David

>
>>
>>>
>>>> java.cpp:
>>>>
>>>> Typo: likelyhood -> likelihood
>>>
>>> Fixed.
>>>
>>>
>>>> ---
>>>>
>>>> task.cpp
>>>>
>>>>  81 int PeriodicTask::time_to_wait() {
>>>>  82   assert(Thread::current()->is_Watcher_thread(), "must be
>>>> WatcherThread");
>>>>
>>>> This is currently true but not sure it has to be true. If we are
>>>> assuming/constraining a subset of the task API to only be callable by
>>>> the WatcherThread then perhaps we should document that in task.hpp ?
>>>> And as the WatcherThread is a friend, none of those methods need to be
>>>> public - ie execute_if_pending and time_to_wait (realtime_tick is
>>>> already private).
>>>
>>> I was too focused on adding asserts that enforced how it works today
>>> and not on how it might be used down the road. There's no reason to
>>> restrict the caller of time_to_wait() to the WatcherThread. I've
>>> deleted the assert on line 82 and I added a comment to task.hpp:
>>>
>>>    // Requires the PeriodicTask_lock.
>>>    static int time_to_wait();
>>>
>>> By leaving time_to_wait() public, that allows code other than the
>>> WatcherThread to use it perhaps for debugging...
>>
>> Okay. I still think the API is somewhat confused about its public
>> interface - execute_if_pending should be protected for use by WT only
>> (and subclasses implement it of course).
>
> Yes, but I think I'll leave that for another bug (probably to
> be fixed by someone else).
>
>
>>
>>>
>>>> 112 void PeriodicTask::enroll() {
>>>>  113   // Follow normal safepoint aware lock enter protocol if the
>>>> caller does
>>>>  114   // not already own the PeriodicTask_lock. Otherwise, we don't
>>>> try to
>>>>  115   // enter it again to avoid deadlocking with a safepoint that
>>>> started
>>>>  116   // after the initial enter and before this enter.
>>>>  117   //
>>>>  118   MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? NULL
>>>>  119                                                       :
>>>> PeriodicTask_lock);
>>>>
>>>> The deadlock is not with a safepoint interaction but the normal
>>>> self-deadlock that would occur trying to lock() a Monitor/Mutex that
>>>> you already own. (Monitors/Mutexes do not support recursive locking -
>>>> in contrast to ObjectMonitors.) Ditto for disenroll.
>>>
>>> I have been working on Java Monitors for waaaayyyy too long!
>>
>> :)
>>
>>> In the header comment for src/share/vm/runtime/mutex.cpp:
>>>
>>> //   Native Monitors do *not* support nesting or recursion but ...
>>>
>>>
>>>> The original comment is the correct one with respect to acquiring with
>>>> a safepoint check if not already locked:
>>>>
>>>> /* XXX could be called from a JavaThread, so we have to check for
>>>>  * safepoint when taking the lock to avoid deadlocking */
>>>
>>> I'm still not fond of that comment. Perhaps:
>>>
>>>    // Follow normal safepoint aware lock enter protocol if the caller
>>> does
>>>    // not already own the PeriodicTask_lock. Otherwise, we don't try to
>>>    // enter it again because VM internal Mutexes do not support
>>> recursion.
>>>
>>> I'm trying to make it clear that we're following safepoint aware
>>> protocol
>>> even though we're using MutexLockerEx. I also wanted a clear statement
>>> about why we skipped the enter if we already owned the lock. I don't
>>> think that we could be called from a JavaThread to be the important
>>> part.
>>
>> Being a JavaThread is the only reason the safepoint protocol has to be
>> considered - for non-JavaThreads we won't look at the safepoint state.
>>
>> The enroll() and disenroll() methods are called from the JavaThreads
>> and they can not be considered "leaf" methods, so they must obey the
>> safepoint protocol wrt the lock.
>>
>> Anyway your comment is fine.
>
> Thanks.
>
>
>>
>>>
>>>> ---
>>>>
>>>> thread.cpp:
>>>>
>>>> 1198 int WatcherThread::sleep() const {
>>>> 1199   // The WatcherThread does not honor the safepoint protocol for
>>>> 1200   // PeriodicTask_lock in order to provide more consistent task
>>>> 1201   // execution timing.
>>>>
>>>> Not sure that is really the correct characterisation. The WT doesn't
>>>> need to honor the safepoint protocol because it isn't a JavaThread and
>>>> so won't block even if a safepoint is in progress. So checking for a
>>>> safepoint is simply a waste of time.
>>>
>>> Much better way of putting it. Thanks! How about:
>>>
>>>    // The WatcherThread is not a JavaThread so we do not honor the
>>>    // safepoint protocol for the PeriodicTask_lock.
>>
>> Ok.
>
> Thanks.
>
>
>>
>>>
>>>>
>>>> 3565     MutexLocker ml(PeriodicTask_lock);
>>>>
>>>> Why did you allow for a safepoint check here? It would only be
>>>> necessary if a JavaThread might concurrently acquire the
>>>> PeriodicTask_lock and a safepoint is initiated - during VM
>>>> initialization. Not completely impossible I guess but have never heard
>>>> of this occurring, so would require some other startup change to
>>>> trigger it.
>>>
>>> There's no good reason to use MutexLockerEx here and I didn't
>>> want to write a comment explaining why we were using one.
>>> Earlier code in Threads::create_vm() uses MutexLocker instead
>>> of MutexLockerEx.
>>
>> Okay. Change is harmless.
>
> Cool. I think we're on the same page.
>
> Thanks again for the reviews.
>
> Dan
>>
>> Cheers,
>> David
>>
>>>
>>> Thanks again for the very thorough review!
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Carsten
>>>>>>
>>>>>> On Tue, Feb 17, 2015 at 4:44 PM, Daniel D. Daugherty
>>>>>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>>
>>>>>> wrote:
>>>>>>
>>>>>>     Greetings,
>>>>>>
>>>>>>     My fix for the following bug:
>>>>>>
>>>>>>         JDK-8047720 Xprof hangs on Solaris
>>>>>>
>>>>>>     that was pushed to JDK9 last June needs to be cleaned up.
>>>>>>
>>>>>>     Thanks to Alex Garthwaite (agarthwaite at twitter.com
>>>>>>     <mailto:agarthwaite at twitter.com>) and Carsten
>>>>>>     Varming (varming at gmail.com <mailto:varming at gmail.com>) for
>>>>>>     reporting the mess that I made
>>>>>>     in WatcherThread::stop() and for suggesting fixes.
>>>>>>
>>>>>>     This code review is for a general cleanup pass on
>>>>>> PeriodicTask_lock
>>>>>>     and some of the surrounding code. This is a targeted review in
>>>>>> that
>>>>>>     I would like to hear from three groups of people:
>>>>>>
>>>>>>     1) The author and reviewers for:
>>>>>>
>>>>>>        JDK-7127792 Add the ability to change an existing
>>>>>> PeriodicTask's
>>>>>>                    execution interval
>>>>>>
>>>>>>        Rickard, David H, and Markus G.
>>>>>>
>>>>>>     2) The reviewers for:
>>>>>>
>>>>>>        JDK-8047720 Xprof hangs on Solaris
>>>>>>
>>>>>>        Markus G and Coleen
>>>>>>
>>>>>>     3) Alex and Carsten
>>>>>>
>>>>>>
>>>>>>     Here's the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8072439-webrev/0-for_jdk9_hs_rt/
>>>>>> <http://cr.openjdk.java.net/%7Edcubed/8072439-webrev/0-for_jdk9_hs_rt/>
>>>>>>
>>>>>>
>>>>>>     I've attached the original RFR for JDK-8047720 that explains
>>>>>>     the original deadlock that was being fixed. Similar testing
>>>>>>     will be done with this fix.
>>>>>>
>>>>>>     Dan
>>>>>>
>>>>>>
>>>>>
>>>
>