RFR(XS) for PeriodicTask_lock cleanup (8072439)

Thu Feb 19 23:03:22 UTC 2015

On 2/18/15 7:02 PM, David Holmes wrote:
> Hi Dan,
>
> On 19/02/2015 2:28 AM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the fast review! Replies embedded below...
>>
>>
>> On 2/17/15 5:53 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> A few comments, mainly on the comments :)
>>
>> Apropos since the changeset is mostly comments...
>>
>>
>>>
>>> On 18/02/2015 8:42 AM, Daniel D. Daugherty wrote:
>>>> On 2/17/15 3:22 PM, Carsten Varming wrote:
>>>>> Dear Daniel,
>>>>>
>>>>> Looks good to me.
>>>>
>>>> Thanks for the fast review.
>>>>
>>>>> The line: "OrderAccess::fence();  // ensure WatcherThread sees update
>>>>> in main loop" seems unnecessary as the lock acts as a memory barrier.
>>>>
>>>> Yes, I keep looking at that line from the original work on
>>>> JDK-7127792 and wonder why it's there... I'll chase that down
>>>> with the original folks...
>>>
>>> Only needed when using lock-free access, so can be removed now.
>
> Oops - my bad!
>
>> Yes, that will be the likely outcome.
>>
>> However, the original code added by JDK-7127792 wasn't lock-free so
>> I still want to chase down the original thinking.
>
> The lock-free code was here:
>
> void WatcherThread::run() {
>   assert(this == watcher_thread(), "just checking");
>
>   this->record_stack_base_and_size();
>   this->initialize_thread_local_storage();
>   this->set_active_handles(JNIHandleBlock::allocate_block());
>   while(!_should_terminate) {
>
> and of course you still have a lock-free access in the new code:
>
> 1300     if (_should_terminate) {
> 1301       // check for termination before posting the next tick
> 1302       break;
> 1303     }
>
> so the fence() could stay, but its omission would be harmless I think 
> because not seeing the update immediately is no different to having 
> the update happen just after the check - it's racy code. The only 
> other option would be to always access _should_terminate with the 
> PeriodicTask_lock held - but that is subverting the intent of the lock 
> somewhat as _should_terminate is part of the WatcherThread state. 
> Actually the more I look at this the more I think the lock has already 
> been subverted to assist with the WT termination protocol.

Just to make sure we're on the same page I think Carsten was
talking about this block (from 8072439):

1334   {
1335     // Follow normal safepoint aware lock enter protocol since the
1336     // WatcherThread is stopped by another JavaThread.
1337     MutexLocker ml(PeriodicTask_lock);
1338     _should_terminate = true;
1339     OrderAccess::fence();  // ensure WatcherThread sees update in 
main loop
1340
1341     WatcherThread* watcher = watcher_thread();
1342     if (watcher != NULL) {
1343       // unpark the WatcherThread so it can see that it should 
terminate
1344       watcher->unpark();
1345     }
1346   }

where the fence() on line 1339 is not needed because when we drop
the PeriodicTask_lock on line 1346 we get a fence() equivalent.

This is the block before my fix for JDK-8047720:

1360   {
1361     MutexLockerEx ml(PeriodicTask_lock, 
Mutex::_no_safepoint_check_flag);
1362     _should_terminate = true;
1363     OrderAccess::fence();  // ensure WatcherThread sees update in 
main loop
1364
1365     WatcherThread* watcher = watcher_thread();
1366     if (watcher != NULL)
1367       watcher->unpark();
1368   }

It has the same fence() call and the same lock exit (on different
line numbers) so I'm wondering about the original thinking here.
I still haven't heard back from Rickard...

>
>>
>>> java.cpp:
>>>
>>> Typo: likelyhood -> likelihood
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> task.cpp
>>>
>>>  81 int PeriodicTask::time_to_wait() {
>>>  82   assert(Thread::current()->is_Watcher_thread(), "must be
>>> WatcherThread");
>>>
>>> This is currently true but not sure it has to be true. If we are
>>> assuming/constraining a subset of the task API to only be callable by
>>> the WatcherThread then perhaps we should document that in task.hpp ?
>>> And as the WatcherThread is a friend, none of those methods need to be
>>> public - ie execute_if_pending and time_to_wait (realtime_tick is
>>> already private).
>>
>> I was too focused on adding asserts that enforced how it works today
>> and not on how it might be used down the road. There's no reason to
>> restrict the caller of time_to_wait() to the WatcherThread. I've
>> deleted the assert on line 82 and I added a comment to task.hpp:
>>
>>    // Requires the PeriodicTask_lock.
>>    static int time_to_wait();
>>
>> By leaving time_to_wait() public, that allows code other than the
>> WatcherThread to use it perhaps for debugging...
>
> Okay. I still think the API is somewhat confused about its public 
> interface - execute_if_pending should be protected for use by WT only 
> (and subclasses implement it of course).

Yes, but I think I'll leave that for another bug (probably to
be fixed by someone else).

>
>>
>>> 112 void PeriodicTask::enroll() {
>>>  113   // Follow normal safepoint aware lock enter protocol if the
>>> caller does
>>>  114   // not already own the PeriodicTask_lock. Otherwise, we don't
>>> try to
>>>  115   // enter it again to avoid deadlocking with a safepoint that
>>> started
>>>  116   // after the initial enter and before this enter.
>>>  117   //
>>>  118   MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? NULL
>>>  119                                                       :
>>> PeriodicTask_lock);
>>>
>>> The deadlock is not with a safepoint interaction but the normal
>>> self-deadlock that would occur trying to lock() a Monitor/Mutex that
>>> you already own. (Monitors/Mutexes do not support recursive locking -
>>> in contrast to ObjectMonitors.) Ditto for disenroll.
>>
>> I have been working on Java Monitors for waaaayyyy too long!
>
> :)
>
>> In the header comment for src/share/vm/runtime/mutex.cpp:
>>
>> //   Native Monitors do *not* support nesting or recursion but ...
>>
>>
>>> The original comment is the correct one with respect to acquiring with
>>> a safepoint check if not already locked:
>>>
>>> /* XXX could be called from a JavaThread, so we have to check for
>>>  * safepoint when taking the lock to avoid deadlocking */
>>
>> I'm still not fond of that comment. Perhaps:
>>
>>    // Follow normal safepoint aware lock enter protocol if the caller 
>> does
>>    // not already own the PeriodicTask_lock. Otherwise, we don't try to
>>    // enter it again because VM internal Mutexes do not support 
>> recursion.
>>
>> I'm trying to make it clear that we're following safepoint aware 
>> protocol
>> even though we're using MutexLockerEx. I also wanted a clear statement
>> about why we skipped the enter if we already owned the lock. I don't
>> think that we could be called from a JavaThread to be the important 
>> part.
>
> Being a JavaThread is the only reason the safepoint protocol has to be 
> considered - for non-JavaThreads we won't look at the safepoint state.
>
> The enroll() and disenroll() methods are called from the JavaThreads 
> and they can not be considered "leaf" methods, so they must obey the 
> safepoint protocol wrt the lock.
>
> Anyway your comment is fine.

Thanks.

>
>>
>>> ---
>>>
>>> thread.cpp:
>>>
>>> 1198 int WatcherThread::sleep() const {
>>> 1199   // The WatcherThread does not honor the safepoint protocol for
>>> 1200   // PeriodicTask_lock in order to provide more consistent task
>>> 1201   // execution timing.
>>>
>>> Not sure that is really the correct characterisation. The WT doesn't
>>> need to honor the safepoint protocol because it isn't a JavaThread and
>>> so won't block even if a safepoint is in progress. So checking for a
>>> safepoint is simply a waste of time.
>>
>> Much better way of putting it. Thanks! How about:
>>
>>    // The WatcherThread is not a JavaThread so we do not honor the
>>    // safepoint protocol for the PeriodicTask_lock.
>
> Ok.

Thanks.

>
>>
>>>
>>> 3565     MutexLocker ml(PeriodicTask_lock);
>>>
>>> Why did you allow for a safepoint check here? It would only be
>>> necessary if a JavaThread might concurrently acquire the
>>> PeriodicTask_lock and a safepoint is initiated - during VM
>>> initialization. Not completely impossible I guess but have never heard
>>> of this occurring, so would require some other startup change to
>>> trigger it.
>>
>> There's no good reason to use MutexLockerEx here and I didn't
>> want to write a comment explaining why we were using one.
>> Earlier code in Threads::create_vm() uses MutexLocker instead
>> of MutexLockerEx.
>
> Okay. Change is harmless.

Cool. I think we're on the same page.

Thanks again for the reviews.

Dan
>
> Cheers,
> David
>
>>
>> Thanks again for the very thorough review!
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Carsten
>>>>>
>>>>> On Tue, Feb 17, 2015 at 4:44 PM, Daniel D. Daugherty
>>>>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>>
>>>>> wrote:
>>>>>
>>>>>     Greetings,
>>>>>
>>>>>     My fix for the following bug:
>>>>>
>>>>>         JDK-8047720 Xprof hangs on Solaris
>>>>>
>>>>>     that was pushed to JDK9 last June needs to be cleaned up.
>>>>>
>>>>>     Thanks to Alex Garthwaite (agarthwaite at twitter.com
>>>>>     <mailto:agarthwaite at twitter.com>) and Carsten
>>>>>     Varming (varming at gmail.com <mailto:varming at gmail.com>) for
>>>>>     reporting the mess that I made
>>>>>     in WatcherThread::stop() and for suggesting fixes.
>>>>>
>>>>>     This code review is for a general cleanup pass on 
>>>>> PeriodicTask_lock
>>>>>     and some of the surrounding code. This is a targeted review in 
>>>>> that
>>>>>     I would like to hear from three groups of people:
>>>>>
>>>>>     1) The author and reviewers for:
>>>>>
>>>>>        JDK-7127792 Add the ability to change an existing 
>>>>> PeriodicTask's
>>>>>                    execution interval
>>>>>
>>>>>        Rickard, David H, and Markus G.
>>>>>
>>>>>     2) The reviewers for:
>>>>>
>>>>>        JDK-8047720 Xprof hangs on Solaris
>>>>>
>>>>>        Markus G and Coleen
>>>>>
>>>>>     3) Alex and Carsten
>>>>>
>>>>>
>>>>>     Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8072439-webrev/0-for_jdk9_hs_rt/
>>>>> <http://cr.openjdk.java.net/%7Edcubed/8072439-webrev/0-for_jdk9_hs_rt/> 
>>>>>
>>>>>
>>>>>     I've attached the original RFR for JDK-8047720 that explains
>>>>>     the original deadlock that was being fixed. Similar testing
>>>>>     will be done with this fix.
>>>>>
>>>>>     Dan
>>>>>
>>>>>
>>>>
>>