RFR (S): 8182703: Correct G1 barrier queue lock orderings

Tue Jul 4 08:10:34 UTC 2017

Hi Erik,

On 2017-07-03 14:53, Erik Österlund wrote:
> Hi Mikael,
> 
> Thank you for the review!
> 
> Regarding the use of + x in the current enum system for lock rankings, I 
> agree that it is not a brilliant system and you feel a bit sad when your 
> lock rank is "leaf+2". However, sometimes I feel like abstracting 
> numbers with names can become confusing as well - even misleading. Like 
> for example how leaf is no longer a leaf and how it is questionable 
> whether special is really all that special any longer.
> 
> When I thought about possible name for access + 0 and access + 1, I was 
> thinking something in the lines of "perhaps access_inner/outer or 
> access_leaf/nonleaf", but then that might get confusing if suddenly 
> access will need 3 ranks for some reason and we get an "access_special" 
> rank or something.

I suppose you're right. Let's leave the values as you suggested.

> 
> Perhaps a different solution than enum names would be nice long-term for 
> lock ranks and deadlock detection, but I believe that might be outside 
> of the current scope for this change.

Agreed.
/Mikael

> 
> Thanks,
> /Erik
> 
> On 2017-07-03 13:57, Mikael Gerdin wrote:
>> Hi Erik,
>>
>> On 2017-06-26 15:34, Erik Österlund wrote:
>>> Hi,
>>>
>>> Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
>>
>> I think this change makes sense and I agree with your reasoning below.
>>
>> I'm leaning towards suggesting creating a named enum value for 
>> "access+1" to begin a move towards getting rid of adding and 
>> subtracting values from enums in this code. I don't have a good name 
>> for it, though.
>>
>> /Mikael
>>
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8182703
>>>
>>> The G1 barrier queues have very awkward lock orderings for the 
>>> following reasons:
>>>
>>> 1) These queues may queue up things when performing a reference write 
>>> or resolving a jweak (intentionally or just happened to be jweak, 
>>> even though it looks like a jobject), which can happen in a lot of 
>>> places in the code. We resolve JNIHandles while holding special locks 
>>> in many places. We perform reference writes also in many places. Now 
>>> the unsuspecting hotspot developer might think that it is okay to 
>>> resolve a JNIHandle or perform a reference write while possibly 
>>> holding a special lock. But no. In some cases, object writes have 
>>> been moved out of locks and replaced with lock-free CAS, only to 
>>> dodge the G1 write barrier locks. I don't think the G1 lock ordering 
>>> issues should shape the shared code rather than the other way around.
>>> 2) There is an issue that the shared queue locks have a "special" 
>>> rank, which is below the lock ranks used by the cbl monitor and free 
>>> list monitor. This leads to an issue when these locks have to be 
>>> taken while holding the shared queue locks. The current solution is 
>>> to drop the shared queue locks temporarily, introducing nasty data 
>>> races. These races are guarded, but the whole race seems very 
>>> unnecessary.
>>>
>>> I argue that if the G1 write barrier queue locks were simply set 
>>> appropriately in the first place by analyzing what ranks they should 
>>> have, none of the above issues would exist. Therefore I propose this 
>>> new ordering.
>>>
>>> Specifically, I recognize that locks required for performing memory 
>>> accesses and resolving JNIHandles are more special than the "special" 
>>> rank. Therefore, this change introduces a new lock ordering category 
>>> called "access", which is to be used by barriers required to perform 
>>> memory accesses. In other words, by recognizing the rank is more 
>>> special than "special", we can remove "special" code to walk around 
>>> making its rank more "special". That seems desirable to me. The 
>>> access locks need to comply to the same constraints as the special 
>>> locks: they may not perform safepoint checks.
>>>
>>> The old lock ranks were:
>>>
>>> SATB_Q_FL_lock: special
>>> SATB_Q_CBL_mon: leaf - 1
>>> Shared_SATB_Q_lock: leaf - 1
>>>
>>> DirtyCardQ_FL_lock: special
>>> DirtyCardQ_CBL_mon: leaf - 1
>>> Shared_DirtyCardQ_lock: leaf - 1
>>>
>>> The new lock ranks are:
>>>
>>> SATB_Q_FL_lock: access (special - 2)
>>> SATB_Q_CBL_mon: access (special - 2)
>>> Shared_SATB_Q_lock: access + 1 (special - 1)
>>>
>>> DirtyCardQ_FL_lock: access (special - 2)
>>> DirtyCardQ_CBL_mon: access (special - 2)
>>> Shared_DirtyCardQ_lock: access + 1 (special - 1)
>>>
>>> Analysis:
>>>
>>> Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same 
>>> group of locks. The free list lock, the completed buffer list monitor 
>>> and the shared queue lock.
>>>
>>> Observations:
>>> 1) The free list lock and completed buffer list monitors (members of 
>>> PtrQueueSet) are disjoint. We never hold both of them at the same time.
>>> Rationale: The free list lock is only used from 
>>> PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and 
>>> PtrQueueSet::reduce_free_list, and no callsite from there can be 
>>> expanded where the cbl monitor is acquired. So therefore it is 
>>> impossible to acquire the cbl monitor while holding the free list 
>>> lock. The opposite case of acquiring the free list lock while holding 
>>> the cbl monitor is also not possible; only the following places 
>>> acquire the cbl monitor: PtrQueueSet::enqueue_complete_buffer, 
>>> PtrQueueSet::merge_bufferlists, 
>>> PtrQueueSet::assert_completed_buffer_list_len_correct, 
>>> PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, 
>>> FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, 
>>> DirtyCardQueueSet::clear, 
>>> SATBMarkQueueSet::apply_closure_to_completed_buffer and 
>>> SATBMarkQueueSet::abandon_partial_marking. Again, neither of these 
>>> paths where the cbl monitor is held can expand callsites to a place 
>>> where the free list locks are held. Therefore it holds that the cbl 
>>> monitor can not be held while the free list lock is held, and the 
>>> free list lock can not be held while the cbl monitor is held. 
>>> Therefore they are held disjointly.
>>> 2) We might hold the shared queue locks before acquiring the 
>>> completed buffer list monitor. (today we drop the shared queue lock 
>>> then and reacquire it later as a hack as already described)
>>> 3) We do not acquire a shared queue lock while holding the free list 
>>> lock or completed buffer list monitor, as there is no reference from 
>>> a PtrQueueSet to its shared queue, so those code paths do not know 
>>> how to reach the shared PtrQueue to acquire its lock. The derived 
>>> classes are exceptions but they never use the shared queue lock while 
>>> holding the completed buffer list monitor or free list lock. 
>>> DirtyCardQueueSet uses the shared queue for concatenating logs (in a 
>>> safepoint without holding those locks). The SATBMarkQueueSet uses the 
>>> shared queue for filtering the buffers, fiddling with activeness, 
>>> printing and resetting, all without grabbing any locks.
>>> 4) We do not acquire any other lock (above event) while holding the 
>>> free list lock or completed buffer list monitors. This was discovered 
>>> by manually expanding the call graphs from where these two locks are 
>>> held.
>>>
>>> Derived constraints:
>>> a) Because of observation 1, the free list lock and completed buffer 
>>> list monitors can have the same rank.
>>> b) Because of observations 1 and 2, the shared queue lock ought to 
>>> have a rank higher than the ranks of the free list lock and the 
>>> completed buffer list monitors (not the case today).
>>> c) Because of of observation 3 and 2, the free list lock and 
>>> completed buffer list monitors ought to have a rank lower than the 
>>> rank of the shared queue lock.
>>> d) Because of observation 4 (and constraints a-c), all the barrier 
>>> locks should be below the "special" rank without violating any 
>>> existing ranks.
>>>
>>> The proposed new lock ranks conform to the constraints derived from 
>>> my observations. It is worth noting that the potential relationship 
>>> that could break (and why they do not) are:
>>> 1) If a lock is acquired from within the barriers that does not 
>>> involve the shared queue lock, the free list lock or the completed 
>>> buffer list monitor, we have now inverted their relationship as that 
>>> other lock would probably have a rank higher than or equal to 
>>> "special". But due to observation 4, there are no such cases.
>>> 2) The relationship between the shared queue lock and the completed 
>>> buffer list monitor has been changed so both can be held at the same 
>>> time if the shared queue lock is acquired first (which it is). This 
>>> is arguably the way it should have been from the first place, and the 
>>> old solution had ugly hacks where we would drop the shared queue lock 
>>> to not run into the lock order assert (and only not to run into the 
>>> lock order assert, i.e. not to avoid potential deadlock) to ensure 
>>> the locks are not held at the same time. That code has now been 
>>> removed, so that the shared queue lock is still held when enqueueing 
>>> completed buffers (no dodgy dropping and reclaiming), and the code 
>>> for handling the races due to multiple concurrent enqueuers has also 
>>> been removed and replaced with an assertion that there simply should 
>>> not be multiple concurrent enqueuers. Since the shared queue lock is 
>>> now held throughout the whole operation, there will be no concurrent 
>>> enqueuers.
>>> 3) The completed buffer list monitor used to have a higher rank than 
>>> the free list lock. Now they have the same. Therefore, they could 
>>> previously allow them to be held at the same time if the cbl monitor 
>>> was acquired first. However, as discussed, there is no such case, and 
>>> they ought to have the same rank not to confuse their true 
>>> disjointness. If anyone insists we do not break this relationship 
>>> despite the true disjointness, I could consent to adding another 
>>> access lock rank, like this: 
>>> http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think 
>>> it seems better to have the same rank since they are actually truly 
>>> disjoint and should remain disjoint.
>>>
>>> I do recognize that long term, we *might* want a lock-free solution 
>>> or something (not saying we do or do not). But until then, the ranks 
>>> ought to be corrected so that they do not cause these problems 
>>> causing everyone to bash their head against the awkward G1 lock ranks 
>>> throughout the code and make hacks around it.
>>>
>>> Testing: JPRT with hotspot all and lots of local testing.
>>>
>>> Thanks,
>>> /Erik
>