RFR (S): 8182703: Correct G1 barrier queue lock orderings

Fri Jul 7 15:17:39 UTC 2017

Hi Kim,

Added hotspot-dev as requested.

To answer your worries we must first understand what invariant these 
checks for 'special' locks seek to achieve.
The invariant is, AFAIK, that locks with a rank of 'special' and below 
(now including 'access' as well as 'event' that was already there 
before) must *not* check for safepoints when grabbed from JavaThreads. 
Safepoint checks translate to performing ThreadBlockInVM which must 
*not* be performed in 'special' or below ranked locks.
The reason this is necessary, is that those locks must be usable from 
safepoint-unsafe places, e.g. leaf calls, where we can not yield to the 
safepoint synchronizer. I believe that must have been what the name 
'special' originated from, correct me if I'm wrong. Since locking in 
safepoint-unsafe places potentially blocks the safepoint synchronizer, a 
deadlock would arise if a thread, 'Thread A', acquires a special lock 
with Java/VM thread state. But that special lock is held by another 
thread, 'Thread B', that has blocked on a non-special lock, because it 
yielded to the safepoint synchronizer on the VM thread, that is 
concurrently synchronizing a safepoint. However, the safepoint 
synchronizer is blocked waiting for the thread acquiring the special 
lock to yield. But it never will. In the end, 'Thread A' waits for 
'Thread B' to release the special lock, and 'Thread B' waits for the 
VMThread that is safepoint synchronizing, and the VMThread is waiting 
for 'Thread A' - a deadlock.
With the provided invariant however, it is impossible to acquire 
non-safepoint non-special locks while already holding 'special' and 
below locks. Therefore, by checking for the condition of the invariant, 
we will be certain such a deadlock can not happen, as that would violate 
the usual lock rank ordering that we all know about - one can not 
acquire a lock that has a rank higher than locks already acquired.

First of all, before we delve deeper into this rabbit hole. Note that if 
our new 'access' locks are used properly, e.g. do not perform safepoint 
checks or call code that needs to have a safepoint-safe state under the 
lock, we are good. The G1 barrier code conforms to those restrictions 
and the code has since forever been called in leaf calls while in Java 
thread state, without oop maps, making it a not safepoint-safe state. So 
that is all good. The remaining question to answer is, had we done crazy 
things while holding those access locks, would the deadlock detection 
system have detected that?

To answer that question we must examine whether that invariant holds or not.

There are three paths for locking a Mutex/Monitor, try_lock(), lock() 
and lock_without_safepoint_check(). Among these, only lock() violates 
the invariant for 'special' and below locks taken from a JavaThread. It 
is the only locking operation that performs a safepoint check. 
try_lock() instead returns false if the lock could not be acquired, and 
lock_without_safepoint_check does not check for safepoints.

MutexLocker and MutexLockerEx are abstractions around Mutex that boil 
down to calls to either lock() or lock_without_safepoint_check(). So if 
lock() catches the broken invariant, all locking in hotspot will somehow 
catch it. Let's examine what happens in lock().

share/vm/runtime/mutex.cpp:933:21:    assert(rank() > Mutex::special, 
"Potential deadlock with special or lesser rank mutex");
This is the check in Monitor::lock() *with* safepoint check. It 
definitely catches illegal uses of lock() on 'special', 'event' and 
'access' ranked locks on JavaThreads. Since we always catch misuse of 
special and below ranked locks here, the deadlock detection system works 
correctly even for 'event' and the new 'access' ranked locks. The other 
checks are mostly redundant and will eventually boil down to this check.

Examples:

share/vm/runtime/mutexLocker.hpp:165:29:    assert(mutex->rank() != 
Mutex::special
share/vm/runtime/mutexLocker.hpp:173:29:    assert(mutex->rank() != 
Mutex::special,
These two are constructors for MutexLocker which will call lock(). 
Therefore, this check is redundant. The 'event' and 'access' ranks will 
both miss this redundant check, but subsequently run into the check in 
lock(), which is the one that matters and still catches the broken 
invariant.

share/vm/runtime/mutexLocker.hpp:208:30: assert(mutex->rank() > 
Mutex::special || no_safepoint_check,
This check in MutexLockerEx checks that either the lock should be over 
special or it must not check for safepoints. This works as intended with 
'event' and 'access' locks. They are forced to perform a lock without 
safepoint check, and hence not enter the lock() path of Mutex. However, 
if they did, it would still be redundantly sanity checked there too.

share/vm/runtime/mutex.cpp:1384:23:         || rank() == Mutex::special, 
"wrong thread state for using locks");
share/vm/runtime/mutex.cpp:1389:30:    debug_only(if (rank() != 
Mutex::special) \
These two check checks are found in Monitor::check_prelock_state(). 
Which is called from lock() and try_lock(). As for lock, we already 
concluded that using lock() *at all* on a special lock from a JavaThread 
will be found out and an assert will trigger. So these checks for 
special locks seem to be a bit redundant. As for the try_lock() case, 
they even seem wrong. Checking for valid safepoint state for a locking 
operation that can't block seems wrong. But that is invariant of whether 
the lock is special or not. It just should not check for safepoint-safe 
states on try_lock().

share/vm/runtime/thread.cpp:903:27:           cur->rank() == 
Mutex::special) {
This check is in Thread::check_for_valid_safepoint_state() where it 
walks all the monitors and makes sure that we don't have any of 
{Threads_lock, Compile_lock, VMOperationRequest_lock, 
VMOperationQueue_lock} and allow_vm_block() or it contains any special 
lock. This is performed when e.g. allocating etc. This check should 
arguably check for special *and below* ranked locks that should not be 
acquired while in safepoint-safe. However, those access locks return 
true on allow_vm_block(), and therefore will correctly be detected as 
dangerous had we done crazy things under these locks.

All in all, I believe that the deadlock detecion system has some 
redundant, and some confusing checks that involve the lock rank 
Mutex::special. But I do believe that it works and would detect 
deadlocks, but could do with some reworking to make it more explicit. 
And that is invariant of the new access rank and applies equally to the 
event rank.

However, since these access locks play well with the current deadlock 
detection as they do not do anything illegal, and even if use of these 
locks did indeed do illegal things, it would still be detected by the 
deadlock detection system, it is reasonable to say that refactoring the 
deadlock detection system is a separate RFE?

Specifically, clarifying the deadlock detection system by removing 
redundant checks, not checking for safepoint-safe state in try_lock as 
well as explicitly listing special and below locks as illegal when 
verifying Thread::check_for_valid_safepoint_state(), regardles of 
whether allow_vm_block() is true or not. Sounds like a separate RFE to me!

Thanks,
/Erik

On 2017-07-06 22:15, Kim Barrett wrote:
>> On Jul 6, 2017, at 4:11 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Jul 4, 2017, at 10:00 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> The lock ranking changes look good.
>> I'm going to retract that.
>>
>> How does these new lock rankings interact with various assertions that
>> rank() == or != Mutex::special?  I'm not sure those places handle
>> these new ranks properly.  (I'm not sure those places handle
>> Mutex::event rank properly either.)
> And maybe this change needs to be discussed on hotspot-dev rather than hotspot-gc-dev.
>