RFR: 8232588: G1 concurrent System.gc can return early or late

Thu Oct 31 20:53:01 UTC 2019

RFR: 8232588: G1 concurrent System.gc can return early or late
RFR: 8233279: G1: GCLocker GC with +GCLockerInvokesConcurrent spins while cycle in progress 

Please review this refactoring and fixing of the state machine used by
G1CollectedHeap::collect for handling requests for concurrent collections.

The handling of concurrent collection requests is now split out into a
helper function for that purpose.  All of the state machine logic for
checking for completion, waiting for completions, and performing retries is
now in that new helper function, rather than being distributed between
try_collect() and various parts of the VMOp.

Added a new VMOp, VM_G1TryInitiateConcMark.  This simplified both the
handling of this case and VM_G1CollectForAllocation.  The new VMOp provides
some additional information for use by the state machine.

For user-requested concurrent GC requests, the previously intended behavior
was to wait for an in-progress concurrent marking cycle (if any), then start
a new concurrent marking cycle and wait for it to complete.  However, there
were various race conditions that might result in returning either sooner or
later than intended.  This change addresses those races, so that we get
consistent behavior for such requests.

(WhiteBox.g1StartConcMarkCycle is the function that uses _wb_conc_mark.
With that name, it's not obvious that the full waiting behavior is intended,
but that's what it used to do, so not changing it.  Some tests follow it
with a sleep-wait for !WB.g1InConcurrentMark(), while others seem to expect
it to perform a complete collection.)

A change is that waiting by a user-requested GC for a concurrent marking
cycle to complete used to be performed with the thread transitioned to
native and without safepoint checks on the associated monitor lock and wait.
This was noted as having been cribbed from CMS.  Coleen and I looked at this
and could not come up with a reason for doing that for G1 (anymore, after
the recent spate of locking improvements), so there's a new G1-specific
monitor being used and the locking and waiting is now "normal".  (This makes
the FullGCCount_lock monitor largely CMS-specific.)

For other concurrent GC requests, the only intentional change is for
_gc_locker with GCLockerInvokesConcurrent.  Previously it would spin in
try_collect while there was a concurrent marking cycle in progress, also
blocking any callers of GCLocker::stall_until_clear() (JDK-8233279).  Now it
returns in that situation, though it's not clear that's a great idea either.
Indeed, even when that option was introduced (for CMS, as part of fixing a
bad interaction between GCLocker GCs and +ExplicitGCInvokesConcurrent) it
was not clear it was a good idea (see JDK-6919638).  Fortunately it's off by
default. JDK-8233280 has been filed to remove this option.

CR:
https://bugs.openjdk.java.net/browse/JDK-8233279
https://bugs.openjdk.java.net/browse/JDK-8232588

Webrev:
https://cr.openjdk.java.net/~kbarrett/8232588/open.00/

Testing:
mach5 tier1-6

Local (linux-x64) testing with a program that allocates some live data in
the old gen, then has several threads all repeatedly looping on System.gc().
Looked at output from new logging in try_collect_concurrently and verified
the interleavings of GC start/end and new log messages were as expected.