RFR (S): 8182703: Correct G1 barrier queue lock orderings

Tue Jul 11 12:07:55 UTC 2017

On 2017-07-11 04:19, Kim Barrett wrote:
>> On Jul 5, 2017, at 6:24 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> On 2017-07-05 04:00, Kim Barrett wrote:
>>>> On Jun 26, 2017, at 9:34 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8182703
>>> ------------------------------------------------------------------------------
>>> src/share/vm/gc/g1/ptrQueue.cpp
>>> Removing unlock / relock around
>>>    78   qset()->enqueue_complete_buffer(node);
>>>
>>> I would prefer that this part of this changeset not be made at this
>>> time.
>>>
>>> This part isn't necessary for the main point of this changeset.  It's
>>> a cleanup that is enabled by the lock rank changes, where the rank
>>> changes are required for other reasons.
>> Okay.
>>
>>> It also at least conflicts with, and probably breaks, a pending change
>>> of mine.  (I have a largish stack of patches in this area that didn't
>>> quite make it into JDK 9 before the original FC date, and which I've
>>> been (all too slowly) trying to work my way through and bring into JDK
>>> 10.)
>> I agree that it would be possible to just correct the ranks while allowing the spaghetti synchronization code to remain in the code base. Here are some comments about that to me not so attractive idea:
>> 1) I would really like to get rid of that code, because I think it is poor synchronization practice and its stated reason for existence is gone now.
>> 2) I have to do *something* about that part in the current change, otherwise the comment motivating its existence will be incorrect after my lock rank change. There is no longer a legitimate motivation for doing that unlock and re-lock. So we have the choice of making a new made up motivation why we do this anyway, or to remove it. For me the choice is easily to remove it.
> Or leave it be for now, to avoid knowingly creating more work for
> someone else by inflicting merge conflicts or other breakage on them.
> (But see below.) If the occasional out of date comment was the worst
> of the problems we faced, that would be pretty fabulous.

A two line merge conflict after over a year of dormancy though... ;)

>> 3) If some new actual motivation for dropping that lock arises later on down the road (which I am dubious about), then I do not see an issue with simply re-adding it then, when/if that becomes necessary again, with a new corresponding motivation added in appropriately.
>>
>> As far as your new changes go, I am curious what they do to motivate unlocking/re-locking this shared queue lock again. As outlined in my recent email to Thomas, we do not hold either of these queue locks when concurrent refinement helper code is called from GC barriers invoked from JavaThreads, even with my new changes. If it is in this code path that you will perform more work (just speculating), then that should be invariant of this cleanup.
> One possibility I was thinking of was the buffer filtering step.  I
> mis-remembered and thought that wasn't done for the (locked) shared
> queues, and that one of my pending changes was to change that.  (It's
> been over a year since I worked on those changes, and haven't had time
> to really page them back in.)  But I now see that we already do the
> filtering of the shared SATB queue (dirty card queues don't presently
> have any filtering, but might in the future) while holding its lock.
>
> This suggests a potential (though seemingly hard to avoid) fragility
> resulting from the lowered lock rank.

Note that this does not matter for JavaThreads (including compiler 
threads), for concurrent refinement threads or concurrent marking 
threads, nor does it matter for any thread when marking is not active.

So it seems to me that the worst consequence of this is possibly worse 
latency for operations coinciding in time with concurrent marking, that 
have large amounts of mutations or resurrections, and are not performed 
by JavaThreads (including compiler threads) or GC threads (that are 
performing the concurrent marking) or concurrent refinement threads 
(that have nothing to do with SATB), that are running concurrently with 
each other.

That does not seem to be a huge problem in my book. If it was, and an 
unknown bunch of non-JavaThreads are heavily mutating or resurrecting 
objects concurrent to marking, such that contention is inflicted on the 
shared queue lock for the shared SATB queue, then the right solution for 
that seems to be to give such threads their own local queue, rather than 
to reduce the time spent under the surprisingly hot shared queue lock.

>
> The present SATB filtering doesn't seem to acquire any locks, but it's
> a non-trivial amount of code spread over multiple files, so would be
> easy to miss something or break it in that respect.  Reducing the lock
> ranks requires being very careful with the SATB filtering code.

IMO, adding any lock into the SATB barrier which is used all over 
hotspot in some very shady places arguably requires being very careful 
regardless of my changes. So I am going to assume whoever does that for 
whatever reason is going to be careful.

> The "mutator" help for dirty card queue processing is not presently
> done for the shared queue, but I think could be today.  I'm less sure
> about that with lowered queue lock ranks; I *think* there aren't any
> relevant locks there (other than the very rare shared queue lock in
> refine_card_concurrently), but that's a substantially larger and more
> complex amount of code than SATB queue filtering.

As discussed with Thomas earlier in this thread, there are indeed locks 
blocking this. The HeapRegionRemSet::_m lock is currently a leaf lock. 
If collaborative refinement was to be performed on non-Java threads (and 
non-concurrent refinement threads), then this lock would have to 
decrease to the access rank first. But we concluded that warrants a new 
RFE with separate analysis.

As with the SATB queues though, I do not know what threads would be 
causing such trouble? It is not JavaThreads (including compiler 
threads), concurrent refinement threads, concurrent marking threads. 
That does not leave us with a whole lot of threads to cause that 
contention on the shared queue lock. And as with the SATB queues, if 
there are such threads that cause such contention on the shared queue 
lock, then the right fix seems to be to give them their own local queue 
and stop taking the shared queue lock in the first place.

> It looks like
> something along this line is part of my pending changes.  That would
> certainly be broken by the proposed removal of the temporary
> unlocking.  At the time I was working on it, it seemed like having
> that little unlocking dance simplified things elsewhere.  I can cope
> with the merge conflict (especially since it *is* a merge conflict and
> not silent breakage that I may have forgotten about by the time I get
> back to it), though I would prefer not to have to.  (I can also think
> of some reasons why this might not be worth doing or even a bad idea,
> and don't recall right now what I may have done to address those.)

This is why I wanted to know if you are certain this is truly going to 
be a problem or not.
Since this all seems rather uncertain, would you say it is reasonable 
that you take that two line merge conflict down the road if you 
eventually require putting the unlock/lock sequence back again?

> But while looking at the mutator helper, I realized there may be a
> different problem.  Lowering these lock ranks may not be sufficient to
> allow enqueue in "arbitrary" lock contexts.  The difficulty is that in
> the mutator help case (only applies for dirty card queue right now,
> and currently only if a Java thread dealing with its thread-local
> queue), the allocation of the temporary worker_id is done under the
> CBL lock (which is ok), but if there isn't a free worker_id, it
> *waits* for one, and that's not ok in an arbitrary lock context.
> Right now, we should not be able to hit that wait while not holding
> "critial" locks, because the present CBL rank is too high to (safely)
> be in enqueue in such a context.  But lowering the CBL rank is not
> sufficient to enqueue while holding critical locks; that potential
> wait also needs to be eliminated.  (This is assuming there's a place
> where a Java thread can need an enqueue while holding a critical lock.
> I don't have such a place in mind, but proving it can never happen now
> or in the future seems hard, and contrary to the intent of the
> proposed lock rank changes.)

I agree that in order to be able to freely perform object stores under 
special locks, we would have to stop waiting on the cbl monitor when 
claiming worker IDs in the helper part of the post write barrier. That 
is a good observation. On the same list of requirements for that to 
happen, the HeapRegionRemSet::_m monitor would have to change rank to 
"access" as previously mentioned.

> Eliminating that wait doesn't need to be part of this change, but
> seems like it might be required before taking advantage of the change
> to move some potentially enqueuing operations.

Agreed.

> It shouldn't be too hard to eliminate the wait, but it's a somewhat
> fundamental behavioral change.  The present mechanism places a real
> choke hold on the mutator when concurrent refinement can't keep up.
> Without a blocking operation in there, the mutator could overwhelm
> concurrent refinement, leading to longer pauses. Not that said choke
> hold is all that pleasant either.

Yes this mechanism seems to need some love indeed.

Thanks for reviewing!

/Erik