RFR (S): 8182703: Correct G1 barrier queue lock orderings

Mon Jul 3 11:57:48 UTC 2017

Hi Erik,

On 2017-06-26 15:34, Erik Österlund wrote:
> Hi,
> 
> Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/

I think this change makes sense and I agree with your reasoning below.

I'm leaning towards suggesting creating a named enum value for 
"access+1" to begin a move towards getting rid of adding and subtracting 
values from enums in this code. I don't have a good name for it, though.

/Mikael

> Bug: https://bugs.openjdk.java.net/browse/JDK-8182703
> 
> The G1 barrier queues have very awkward lock orderings for the following 
> reasons:
> 
> 1) These queues may queue up things when performing a reference write or 
> resolving a jweak (intentionally or just happened to be jweak, even 
> though it looks like a jobject), which can happen in a lot of places in 
> the code. We resolve JNIHandles while holding special locks in many 
> places. We perform reference writes also in many places. Now the 
> unsuspecting hotspot developer might think that it is okay to resolve a 
> JNIHandle or perform a reference write while possibly holding a special 
> lock. But no. In some cases, object writes have been moved out of locks 
> and replaced with lock-free CAS, only to dodge the G1 write barrier 
> locks. I don't think the G1 lock ordering issues should shape the shared 
> code rather than the other way around.
> 2) There is an issue that the shared queue locks have a "special" rank, 
> which is below the lock ranks used by the cbl monitor and free list 
> monitor. This leads to an issue when these locks have to be taken while 
> holding the shared queue locks. The current solution is to drop the 
> shared queue locks temporarily, introducing nasty data races. These 
> races are guarded, but the whole race seems very unnecessary.
> 
> I argue that if the G1 write barrier queue locks were simply set 
> appropriately in the first place by analyzing what ranks they should 
> have, none of the above issues would exist. Therefore I propose this new 
> ordering.
> 
> Specifically, I recognize that locks required for performing memory 
> accesses and resolving JNIHandles are more special than the "special" 
> rank. Therefore, this change introduces a new lock ordering category 
> called "access", which is to be used by barriers required to perform 
> memory accesses. In other words, by recognizing the rank is more special 
> than "special", we can remove "special" code to walk around making its 
> rank more "special". That seems desirable to me. The access locks need 
> to comply to the same constraints as the special locks: they may not 
> perform safepoint checks.
> 
> The old lock ranks were:
> 
> SATB_Q_FL_lock: special
> SATB_Q_CBL_mon: leaf - 1
> Shared_SATB_Q_lock: leaf - 1
> 
> DirtyCardQ_FL_lock: special
> DirtyCardQ_CBL_mon: leaf - 1
> Shared_DirtyCardQ_lock: leaf - 1
> 
> The new lock ranks are:
> 
> SATB_Q_FL_lock: access (special - 2)
> SATB_Q_CBL_mon: access (special - 2)
> Shared_SATB_Q_lock: access + 1 (special - 1)
> 
> DirtyCardQ_FL_lock: access (special - 2)
> DirtyCardQ_CBL_mon: access (special - 2)
> Shared_DirtyCardQ_lock: access + 1 (special - 1)
> 
> Analysis:
> 
> Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same 
> group of locks. The free list lock, the completed buffer list monitor 
> and the shared queue lock.
> 
> Observations:
> 1) The free list lock and completed buffer list monitors (members of 
> PtrQueueSet) are disjoint. We never hold both of them at the same time.
> Rationale: The free list lock is only used from 
> PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and 
> PtrQueueSet::reduce_free_list, and no callsite from there can be 
> expanded where the cbl monitor is acquired. So therefore it is 
> impossible to acquire the cbl monitor while holding the free list lock. 
> The opposite case of acquiring the free list lock while holding the cbl 
> monitor is also not possible; only the following places acquire the cbl 
> monitor: PtrQueueSet::enqueue_complete_buffer, 
> PtrQueueSet::merge_bufferlists, 
> PtrQueueSet::assert_completed_buffer_list_len_correct, 
> PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, 
> FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, 
> DirtyCardQueueSet::clear, 
> SATBMarkQueueSet::apply_closure_to_completed_buffer and 
> SATBMarkQueueSet::abandon_partial_marking. Again, neither of these paths 
> where the cbl monitor is held can expand callsites to a place where the 
> free list locks are held. Therefore it holds that the cbl monitor can 
> not be held while the free list lock is held, and the free list lock can 
> not be held while the cbl monitor is held. Therefore they are held 
> disjointly.
> 2) We might hold the shared queue locks before acquiring the completed 
> buffer list monitor. (today we drop the shared queue lock then and 
> reacquire it later as a hack as already described)
> 3) We do not acquire a shared queue lock while holding the free list 
> lock or completed buffer list monitor, as there is no reference from a 
> PtrQueueSet to its shared queue, so those code paths do not know how to 
> reach the shared PtrQueue to acquire its lock. The derived classes are 
> exceptions but they never use the shared queue lock while holding the 
> completed buffer list monitor or free list lock. DirtyCardQueueSet uses 
> the shared queue for concatenating logs (in a safepoint without holding 
> those locks). The SATBMarkQueueSet uses the shared queue for filtering 
> the buffers, fiddling with activeness, printing and resetting, all 
> without grabbing any locks.
> 4) We do not acquire any other lock (above event) while holding the free 
> list lock or completed buffer list monitors. This was discovered by 
> manually expanding the call graphs from where these two locks are held.
> 
> Derived constraints:
> a) Because of observation 1, the free list lock and completed buffer 
> list monitors can have the same rank.
> b) Because of observations 1 and 2, the shared queue lock ought to have 
> a rank higher than the ranks of the free list lock and the completed 
> buffer list monitors (not the case today).
> c) Because of of observation 3 and 2, the free list lock and completed 
> buffer list monitors ought to have a rank lower than the rank of the 
> shared queue lock.
> d) Because of observation 4 (and constraints a-c), all the barrier locks 
> should be below the "special" rank without violating any existing ranks.
> 
> The proposed new lock ranks conform to the constraints derived from my 
> observations. It is worth noting that the potential relationship that 
> could break (and why they do not) are:
> 1) If a lock is acquired from within the barriers that does not involve 
> the shared queue lock, the free list lock or the completed buffer list 
> monitor, we have now inverted their relationship as that other lock 
> would probably have a rank higher than or equal to "special". But due to 
> observation 4, there are no such cases.
> 2) The relationship between the shared queue lock and the completed 
> buffer list monitor has been changed so both can be held at the same 
> time if the shared queue lock is acquired first (which it is). This is 
> arguably the way it should have been from the first place, and the old 
> solution had ugly hacks where we would drop the shared queue lock to not 
> run into the lock order assert (and only not to run into the lock order 
> assert, i.e. not to avoid potential deadlock) to ensure the locks are 
> not held at the same time. That code has now been removed, so that the 
> shared queue lock is still held when enqueueing completed buffers (no 
> dodgy dropping and reclaiming), and the code for handling the races due 
> to multiple concurrent enqueuers has also been removed and replaced with 
> an assertion that there simply should not be multiple concurrent 
> enqueuers. Since the shared queue lock is now held throughout the whole 
> operation, there will be no concurrent enqueuers.
> 3) The completed buffer list monitor used to have a higher rank than the 
> free list lock. Now they have the same. Therefore, they could previously 
> allow them to be held at the same time if the cbl monitor was acquired 
> first. However, as discussed, there is no such case, and they ought to 
> have the same rank not to confuse their true disjointness. If anyone 
> insists we do not break this relationship despite the true disjointness, 
> I could consent to adding another access lock rank, like this: 
> http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think it 
> seems better to have the same rank since they are actually truly 
> disjoint and should remain disjoint.
> 
> I do recognize that long term, we *might* want a lock-free solution or 
> something (not saying we do or do not). But until then, the ranks ought 
> to be corrected so that they do not cause these problems causing 
> everyone to bash their head against the awkward G1 lock ranks throughout 
> the code and make hacks around it.
> 
> Testing: JPRT with hotspot all and lots of local testing.
> 
> Thanks,
> /Erik