RFR (S): 8182703: Correct G1 barrier queue lock orderings

Tue Jul 11 02:19:04 UTC 2017

> On Jul 5, 2017, at 6:24 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
> On 2017-07-05 04:00, Kim Barrett wrote:
>>> On Jun 26, 2017, at 9:34 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8182703
>> ------------------------------------------------------------------------------
>> src/share/vm/gc/g1/ptrQueue.cpp
>> Removing unlock / relock around
>>   78   qset()->enqueue_complete_buffer(node);
>> 
>> I would prefer that this part of this changeset not be made at this
>> time.
>> 
>> This part isn't necessary for the main point of this changeset.  It's
>> a cleanup that is enabled by the lock rank changes, where the rank
>> changes are required for other reasons.
> 
> Okay.
> 
>> It also at least conflicts with, and probably breaks, a pending change
>> of mine.  (I have a largish stack of patches in this area that didn't
>> quite make it into JDK 9 before the original FC date, and which I've
>> been (all too slowly) trying to work my way through and bring into JDK
>> 10.)
> 
> I agree that it would be possible to just correct the ranks while allowing the spaghetti synchronization code to remain in the code base. Here are some comments about that to me not so attractive idea:
> 1) I would really like to get rid of that code, because I think it is poor synchronization practice and its stated reason for existence is gone now.
> 2) I have to do *something* about that part in the current change, otherwise the comment motivating its existence will be incorrect after my lock rank change. There is no longer a legitimate motivation for doing that unlock and re-lock. So we have the choice of making a new made up motivation why we do this anyway, or to remove it. For me the choice is easily to remove it.

Or leave it be for now, to avoid knowingly creating more work for
someone else by inflicting merge conflicts or other breakage on them.
(But see below.) If the occasional out of date comment was the worst
of the problems we faced, that would be pretty fabulous.

> 3) If some new actual motivation for dropping that lock arises later on down the road (which I am dubious about), then I do not see an issue with simply re-adding it then, when/if that becomes necessary again, with a new corresponding motivation added in appropriately.
> 
> As far as your new changes go, I am curious what they do to motivate unlocking/re-locking this shared queue lock again. As outlined in my recent email to Thomas, we do not hold either of these queue locks when concurrent refinement helper code is called from GC barriers invoked from JavaThreads, even with my new changes. If it is in this code path that you will perform more work (just speculating), then that should be invariant of this cleanup.

One possibility I was thinking of was the buffer filtering step.  I
mis-remembered and thought that wasn't done for the (locked) shared
queues, and that one of my pending changes was to change that.  (It's
been over a year since I worked on those changes, and haven't had time
to really page them back in.)  But I now see that we already do the
filtering of the shared SATB queue (dirty card queues don't presently
have any filtering, but might in the future) while holding its lock.

This suggests a potential (though seemingly hard to avoid) fragility
resulting from the lowered lock rank.

The present SATB filtering doesn't seem to acquire any locks, but it's
a non-trivial amount of code spread over multiple files, so would be
easy to miss something or break it in that respect.  Reducing the lock
ranks requires being very careful with the SATB filtering code.

The "mutator" help for dirty card queue processing is not presently
done for the shared queue, but I think could be today.  I'm less sure
about that with lowered queue lock ranks; I *think* there aren't any
relevant locks there (other than the very rare shared queue lock in
refine_card_concurrently), but that's a substantially larger and more
complex amount of code than SATB queue filtering.  It looks like
something along this line is part of my pending changes.  That would
certainly be broken by the proposed removal of the temporary
unlocking.  At the time I was working on it, it seemed like having
that little unlocking dance simplified things elsewhere.  I can cope
with the merge conflict (especially since it *is* a merge conflict and
not silent breakage that I may have forgotten about by the time I get
back to it), though I would prefer not to have to.  (I can also think
of some reasons why this might not be worth doing or even a bad idea,
and don't recall right now what I may have done to address those.)

But while looking at the mutator helper, I realized there may be a
different problem.  Lowering these lock ranks may not be sufficient to
allow enqueue in "arbitrary" lock contexts.  The difficulty is that in
the mutator help case (only applies for dirty card queue right now,
and currently only if a Java thread dealing with its thread-local
queue), the allocation of the temporary worker_id is done under the
CBL lock (which is ok), but if there isn't a free worker_id, it
*waits* for one, and that's not ok in an arbitrary lock context.
Right now, we should not be able to hit that wait while not holding
"critial" locks, because the present CBL rank is too high to (safely)
be in enqueue in such a context.  But lowering the CBL rank is not
sufficient to enqueue while holding critical locks; that potential
wait also needs to be eliminated.  (This is assuming there's a place
where a Java thread can need an enqueue while holding a critical lock.
I don't have such a place in mind, but proving it can never happen now
or in the future seems hard, and contrary to the intent of the
proposed lock rank changes.)

Eliminating that wait doesn't need to be part of this change, but
seems like it might be required before taking advantage of the change
to move some potentially enqueuing operations.

It shouldn't be too hard to eliminate the wait, but it's a somewhat
fundamental behavioral change.  The present mechanism places a real
choke hold on the mutator when concurrent refinement can't keep up.
Without a blocking operation in there, the mutator could overwhelm
concurrent refinement, leading to longer pauses. Not that said choke
hold is all that pleasant either.