RFR (S): 8182703: Correct G1 barrier queue lock orderings

Tue Jul 4 08:27:55 UTC 2017

Hi Mikael,

Thank you for the review.

/Erik

On 2017-07-04 10:10, Mikael Gerdin wrote:
> Hi Erik,
>
> On 2017-07-03 14:53, Erik Österlund wrote:
>> Hi Mikael,
>>
>> Thank you for the review!
>>
>> Regarding the use of + x in the current enum system for lock 
>> rankings, I agree that it is not a brilliant system and you feel a 
>> bit sad when your lock rank is "leaf+2". However, sometimes I feel 
>> like abstracting numbers with names can become confusing as well - 
>> even misleading. Like for example how leaf is no longer a leaf and 
>> how it is questionable whether special is really all that special any 
>> longer.
>>
>> When I thought about possible name for access + 0 and access + 1, I 
>> was thinking something in the lines of "perhaps access_inner/outer or 
>> access_leaf/nonleaf", but then that might get confusing if suddenly 
>> access will need 3 ranks for some reason and we get an 
>> "access_special" rank or something.
>
> I suppose you're right. Let's leave the values as you suggested.
>
>>
>> Perhaps a different solution than enum names would be nice long-term 
>> for lock ranks and deadlock detection, but I believe that might be 
>> outside of the current scope for this change.
>
> Agreed.
> /Mikael
>
>>
>> Thanks,
>> /Erik
>>
>> On 2017-07-03 13:57, Mikael Gerdin wrote:
>>> Hi Erik,
>>>
>>> On 2017-06-26 15:34, Erik Österlund wrote:
>>>> Hi,
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
>>>
>>> I think this change makes sense and I agree with your reasoning below.
>>>
>>> I'm leaning towards suggesting creating a named enum value for 
>>> "access+1" to begin a move towards getting rid of adding and 
>>> subtracting values from enums in this code. I don't have a good name 
>>> for it, though.
>>>
>>> /Mikael
>>>
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8182703
>>>>
>>>> The G1 barrier queues have very awkward lock orderings for the 
>>>> following reasons:
>>>>
>>>> 1) These queues may queue up things when performing a reference 
>>>> write or resolving a jweak (intentionally or just happened to be 
>>>> jweak, even though it looks like a jobject), which can happen in a 
>>>> lot of places in the code. We resolve JNIHandles while holding 
>>>> special locks in many places. We perform reference writes also in 
>>>> many places. Now the unsuspecting hotspot developer might think 
>>>> that it is okay to resolve a JNIHandle or perform a reference write 
>>>> while possibly holding a special lock. But no. In some cases, 
>>>> object writes have been moved out of locks and replaced with 
>>>> lock-free CAS, only to dodge the G1 write barrier locks. I don't 
>>>> think the G1 lock ordering issues should shape the shared code 
>>>> rather than the other way around.
>>>> 2) There is an issue that the shared queue locks have a "special" 
>>>> rank, which is below the lock ranks used by the cbl monitor and 
>>>> free list monitor. This leads to an issue when these locks have to 
>>>> be taken while holding the shared queue locks. The current solution 
>>>> is to drop the shared queue locks temporarily, introducing nasty 
>>>> data races. These races are guarded, but the whole race seems very 
>>>> unnecessary.
>>>>
>>>> I argue that if the G1 write barrier queue locks were simply set 
>>>> appropriately in the first place by analyzing what ranks they 
>>>> should have, none of the above issues would exist. Therefore I 
>>>> propose this new ordering.
>>>>
>>>> Specifically, I recognize that locks required for performing memory 
>>>> accesses and resolving JNIHandles are more special than the 
>>>> "special" rank. Therefore, this change introduces a new lock 
>>>> ordering category called "access", which is to be used by barriers 
>>>> required to perform memory accesses. In other words, by recognizing 
>>>> the rank is more special than "special", we can remove "special" 
>>>> code to walk around making its rank more "special". That seems 
>>>> desirable to me. The access locks need to comply to the same 
>>>> constraints as the special locks: they may not perform safepoint 
>>>> checks.
>>>>
>>>> The old lock ranks were:
>>>>
>>>> SATB_Q_FL_lock: special
>>>> SATB_Q_CBL_mon: leaf - 1
>>>> Shared_SATB_Q_lock: leaf - 1
>>>>
>>>> DirtyCardQ_FL_lock: special
>>>> DirtyCardQ_CBL_mon: leaf - 1
>>>> Shared_DirtyCardQ_lock: leaf - 1
>>>>
>>>> The new lock ranks are:
>>>>
>>>> SATB_Q_FL_lock: access (special - 2)
>>>> SATB_Q_CBL_mon: access (special - 2)
>>>> Shared_SATB_Q_lock: access + 1 (special - 1)
>>>>
>>>> DirtyCardQ_FL_lock: access (special - 2)
>>>> DirtyCardQ_CBL_mon: access (special - 2)
>>>> Shared_DirtyCardQ_lock: access + 1 (special - 1)
>>>>
>>>> Analysis:
>>>>
>>>> Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the 
>>>> same group of locks. The free list lock, the completed buffer list 
>>>> monitor and the shared queue lock.
>>>>
>>>> Observations:
>>>> 1) The free list lock and completed buffer list monitors (members 
>>>> of PtrQueueSet) are disjoint. We never hold both of them at the 
>>>> same time.
>>>> Rationale: The free list lock is only used from 
>>>> PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and 
>>>> PtrQueueSet::reduce_free_list, and no callsite from there can be 
>>>> expanded where the cbl monitor is acquired. So therefore it is 
>>>> impossible to acquire the cbl monitor while holding the free list 
>>>> lock. The opposite case of acquiring the free list lock while 
>>>> holding the cbl monitor is also not possible; only the following 
>>>> places acquire the cbl monitor: 
>>>> PtrQueueSet::enqueue_complete_buffer, 
>>>> PtrQueueSet::merge_bufferlists, 
>>>> PtrQueueSet::assert_completed_buffer_list_len_correct, 
>>>> PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, 
>>>> FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, 
>>>> DirtyCardQueueSet::clear, 
>>>> SATBMarkQueueSet::apply_closure_to_completed_buffer and 
>>>> SATBMarkQueueSet::abandon_partial_marking. Again, neither of these 
>>>> paths where the cbl monitor is held can expand callsites to a place 
>>>> where the free list locks are held. Therefore it holds that the cbl 
>>>> monitor can not be held while the free list lock is held, and the 
>>>> free list lock can not be held while the cbl monitor is held. 
>>>> Therefore they are held disjointly.
>>>> 2) We might hold the shared queue locks before acquiring the 
>>>> completed buffer list monitor. (today we drop the shared queue lock 
>>>> then and reacquire it later as a hack as already described)
>>>> 3) We do not acquire a shared queue lock while holding the free 
>>>> list lock or completed buffer list monitor, as there is no 
>>>> reference from a PtrQueueSet to its shared queue, so those code 
>>>> paths do not know how to reach the shared PtrQueue to acquire its 
>>>> lock. The derived classes are exceptions but they never use the 
>>>> shared queue lock while holding the completed buffer list monitor 
>>>> or free list lock. DirtyCardQueueSet uses the shared queue for 
>>>> concatenating logs (in a safepoint without holding those locks). 
>>>> The SATBMarkQueueSet uses the shared queue for filtering the 
>>>> buffers, fiddling with activeness, printing and resetting, all 
>>>> without grabbing any locks.
>>>> 4) We do not acquire any other lock (above event) while holding the 
>>>> free list lock or completed buffer list monitors. This was 
>>>> discovered by manually expanding the call graphs from where these 
>>>> two locks are held.
>>>>
>>>> Derived constraints:
>>>> a) Because of observation 1, the free list lock and completed 
>>>> buffer list monitors can have the same rank.
>>>> b) Because of observations 1 and 2, the shared queue lock ought to 
>>>> have a rank higher than the ranks of the free list lock and the 
>>>> completed buffer list monitors (not the case today).
>>>> c) Because of of observation 3 and 2, the free list lock and 
>>>> completed buffer list monitors ought to have a rank lower than the 
>>>> rank of the shared queue lock.
>>>> d) Because of observation 4 (and constraints a-c), all the barrier 
>>>> locks should be below the "special" rank without violating any 
>>>> existing ranks.
>>>>
>>>> The proposed new lock ranks conform to the constraints derived from 
>>>> my observations. It is worth noting that the potential relationship 
>>>> that could break (and why they do not) are:
>>>> 1) If a lock is acquired from within the barriers that does not 
>>>> involve the shared queue lock, the free list lock or the completed 
>>>> buffer list monitor, we have now inverted their relationship as 
>>>> that other lock would probably have a rank higher than or equal to 
>>>> "special". But due to observation 4, there are no such cases.
>>>> 2) The relationship between the shared queue lock and the completed 
>>>> buffer list monitor has been changed so both can be held at the 
>>>> same time if the shared queue lock is acquired first (which it is). 
>>>> This is arguably the way it should have been from the first place, 
>>>> and the old solution had ugly hacks where we would drop the shared 
>>>> queue lock to not run into the lock order assert (and only not to 
>>>> run into the lock order assert, i.e. not to avoid potential 
>>>> deadlock) to ensure the locks are not held at the same time. That 
>>>> code has now been removed, so that the shared queue lock is still 
>>>> held when enqueueing completed buffers (no dodgy dropping and 
>>>> reclaiming), and the code for handling the races due to multiple 
>>>> concurrent enqueuers has also been removed and replaced with an 
>>>> assertion that there simply should not be multiple concurrent 
>>>> enqueuers. Since the shared queue lock is now held throughout the 
>>>> whole operation, there will be no concurrent enqueuers.
>>>> 3) The completed buffer list monitor used to have a higher rank 
>>>> than the free list lock. Now they have the same. Therefore, they 
>>>> could previously allow them to be held at the same time if the cbl 
>>>> monitor was acquired first. However, as discussed, there is no such 
>>>> case, and they ought to have the same rank not to confuse their 
>>>> true disjointness. If anyone insists we do not break this 
>>>> relationship despite the true disjointness, I could consent to 
>>>> adding another access lock rank, like this: 
>>>> http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I 
>>>> think it seems better to have the same rank since they are actually 
>>>> truly disjoint and should remain disjoint.
>>>>
>>>> I do recognize that long term, we *might* want a lock-free solution 
>>>> or something (not saying we do or do not). But until then, the 
>>>> ranks ought to be corrected so that they do not cause these 
>>>> problems causing everyone to bash their head against the awkward G1 
>>>> lock ranks throughout the code and make hacks around it.
>>>>
>>>> Testing: JPRT with hotspot all and lots of local testing.
>>>>
>>>> Thanks,
>>>> /Erik
>>