RFR (S): 8182703: Correct G1 barrier queue lock orderings

Mon Jun 26 13:34:22 UTC 2017

Hi,

Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
Bug: https://bugs.openjdk.java.net/browse/JDK-8182703

The G1 barrier queues have very awkward lock orderings for the following 
reasons:

1) These queues may queue up things when performing a reference write or 
resolving a jweak (intentionally or just happened to be jweak, even 
though it looks like a jobject), which can happen in a lot of places in 
the code. We resolve JNIHandles while holding special locks in many 
places. We perform reference writes also in many places. Now the 
unsuspecting hotspot developer might think that it is okay to resolve a 
JNIHandle or perform a reference write while possibly holding a special 
lock. But no. In some cases, object writes have been moved out of locks 
and replaced with lock-free CAS, only to dodge the G1 write barrier 
locks. I don't think the G1 lock ordering issues should shape the shared 
code rather than the other way around.
2) There is an issue that the shared queue locks have a "special" rank, 
which is below the lock ranks used by the cbl monitor and free list 
monitor. This leads to an issue when these locks have to be taken while 
holding the shared queue locks. The current solution is to drop the 
shared queue locks temporarily, introducing nasty data races. These 
races are guarded, but the whole race seems very unnecessary.

I argue that if the G1 write barrier queue locks were simply set 
appropriately in the first place by analyzing what ranks they should 
have, none of the above issues would exist. Therefore I propose this new 
ordering.

Specifically, I recognize that locks required for performing memory 
accesses and resolving JNIHandles are more special than the "special" 
rank. Therefore, this change introduces a new lock ordering category 
called "access", which is to be used by barriers required to perform 
memory accesses. In other words, by recognizing the rank is more special 
than "special", we can remove "special" code to walk around making its 
rank more "special". That seems desirable to me. The access locks need 
to comply to the same constraints as the special locks: they may not 
perform safepoint checks.

The old lock ranks were:

SATB_Q_FL_lock: special
SATB_Q_CBL_mon: leaf - 1
Shared_SATB_Q_lock: leaf - 1

DirtyCardQ_FL_lock: special
DirtyCardQ_CBL_mon: leaf - 1
Shared_DirtyCardQ_lock: leaf - 1

The new lock ranks are:

SATB_Q_FL_lock: access (special - 2)
SATB_Q_CBL_mon: access (special - 2)
Shared_SATB_Q_lock: access + 1 (special - 1)

DirtyCardQ_FL_lock: access (special - 2)
DirtyCardQ_CBL_mon: access (special - 2)
Shared_DirtyCardQ_lock: access + 1 (special - 1)

Analysis:

Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same 
group of locks. The free list lock, the completed buffer list monitor 
and the shared queue lock.

Observations:
1) The free list lock and completed buffer list monitors (members of 
PtrQueueSet) are disjoint. We never hold both of them at the same time.
Rationale: The free list lock is only used from 
PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and 
PtrQueueSet::reduce_free_list, and no callsite from there can be 
expanded where the cbl monitor is acquired. So therefore it is 
impossible to acquire the cbl monitor while holding the free list lock. 
The opposite case of acquiring the free list lock while holding the cbl 
monitor is also not possible; only the following places acquire the cbl 
monitor: PtrQueueSet::enqueue_complete_buffer, 
PtrQueueSet::merge_bufferlists, 
PtrQueueSet::assert_completed_buffer_list_len_correct, 
PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, 
FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, 
DirtyCardQueueSet::clear, 
SATBMarkQueueSet::apply_closure_to_completed_buffer and 
SATBMarkQueueSet::abandon_partial_marking. Again, neither of these paths 
where the cbl monitor is held can expand callsites to a place where the 
free list locks are held. Therefore it holds that the cbl monitor can 
not be held while the free list lock is held, and the free list lock can 
not be held while the cbl monitor is held. Therefore they are held 
disjointly.
2) We might hold the shared queue locks before acquiring the completed 
buffer list monitor. (today we drop the shared queue lock then and 
reacquire it later as a hack as already described)
3) We do not acquire a shared queue lock while holding the free list 
lock or completed buffer list monitor, as there is no reference from a 
PtrQueueSet to its shared queue, so those code paths do not know how to 
reach the shared PtrQueue to acquire its lock. The derived classes are 
exceptions but they never use the shared queue lock while holding the 
completed buffer list monitor or free list lock. DirtyCardQueueSet uses 
the shared queue for concatenating logs (in a safepoint without holding 
those locks). The SATBMarkQueueSet uses the shared queue for filtering 
the buffers, fiddling with activeness, printing and resetting, all 
without grabbing any locks.
4) We do not acquire any other lock (above event) while holding the free 
list lock or completed buffer list monitors. This was discovered by 
manually expanding the call graphs from where these two locks are held.

Derived constraints:
a) Because of observation 1, the free list lock and completed buffer 
list monitors can have the same rank.
b) Because of observations 1 and 2, the shared queue lock ought to have 
a rank higher than the ranks of the free list lock and the completed 
buffer list monitors (not the case today).
c) Because of of observation 3 and 2, the free list lock and completed 
buffer list monitors ought to have a rank lower than the rank of the 
shared queue lock.
d) Because of observation 4 (and constraints a-c), all the barrier locks 
should be below the "special" rank without violating any existing ranks.

The proposed new lock ranks conform to the constraints derived from my 
observations. It is worth noting that the potential relationship that 
could break (and why they do not) are:
1) If a lock is acquired from within the barriers that does not involve 
the shared queue lock, the free list lock or the completed buffer list 
monitor, we have now inverted their relationship as that other lock 
would probably have a rank higher than or equal to "special". But due to 
observation 4, there are no such cases.
2) The relationship between the shared queue lock and the completed 
buffer list monitor has been changed so both can be held at the same 
time if the shared queue lock is acquired first (which it is). This is 
arguably the way it should have been from the first place, and the old 
solution had ugly hacks where we would drop the shared queue lock to not 
run into the lock order assert (and only not to run into the lock order 
assert, i.e. not to avoid potential deadlock) to ensure the locks are 
not held at the same time. That code has now been removed, so that the 
shared queue lock is still held when enqueueing completed buffers (no 
dodgy dropping and reclaiming), and the code for handling the races due 
to multiple concurrent enqueuers has also been removed and replaced with 
an assertion that there simply should not be multiple concurrent 
enqueuers. Since the shared queue lock is now held throughout the whole 
operation, there will be no concurrent enqueuers.
3) The completed buffer list monitor used to have a higher rank than the 
free list lock. Now they have the same. Therefore, they could previously 
allow them to be held at the same time if the cbl monitor was acquired 
first. However, as discussed, there is no such case, and they ought to 
have the same rank not to confuse their true disjointness. If anyone 
insists we do not break this relationship despite the true disjointness, 
I could consent to adding another access lock rank, like this: 
http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think it 
seems better to have the same rank since they are actually truly 
disjoint and should remain disjoint.

I do recognize that long term, we *might* want a lock-free solution or 
something (not saying we do or do not). But until then, the ranks ought 
to be corrected so that they do not cause these problems causing 
everyone to bash their head against the awkward G1 lock ranks throughout 
the code and make hacks around it.

Testing: JPRT with hotspot all and lots of local testing.

Thanks,
/Erik