RFR: 8133051: Concurrent refinement threads may be activated and deactivated at random

Mon Apr 4 18:48:54 UTC 2016

Please review this change to the G1 concurrent refinement thread
controller.  This change addresses unnecessary activation when there
are many threads and few buffers to be processed.  It also addresses
delayed activation due to mis-configuration of the dirty card queue
set's notification mechanism.

This change continues to use (more or less) the existing control
model, only avoiding obviously wasted effort or undesirable delays.
Further enhancements to the control model will be made under
JDK-8137022 or subtasks from that.

- Changed the G1 concurrent refinement thread activation controller to
use a minimum buffer count step between (de)activation values for the
threads.  This is accomplished by having a minimum yellow zone size,
based on the number of refinement threads.  This avoids waking up more
refinement threads than there are buffers available to process.  (It
is, of course, still possible for a refinement thread to wake up and
discover it has no work to do, because of progress by other threads.
But at least we're no longer waking up threads with a near guarantee
they won't find work to do.)

- As part of the above, changed G1ConcRefinementThresholdStep to have
a minimum value of one, a default value of 2, and to be used to
determine a lower bound on the thread activation step size.  A larger
step size makes it less likely a thread will be woken up and discover
other threads have already completed the work "allocated" to it.  Too
large a minimum may overly restrict the number of refinement threads
being activated, leading to missed pause targets.

- Changed the threshold for activation of the primary concurrent
refinement thread via notification from the dirty card queue set upon
enqueue of a new buffer.  It was previously using a notification
threshold of green_zone * (1 + predictor_sigma), rather than the
"normal" activation threshold calculated using the green_zone value
and threshold steps.  Using default configuration parameters, this
could lead to a significantly larger activation threshold,
particularly as the green_zone value grows, which could lead to a much
larger number of pending buffers for pause-time update_rs to process,
leading to missed update_rs time targets and unnecessary back pressure
on the green_zone size.

Comparing runs of specjbb2015 on Linux-x64 with 24 logical processors
(so 18 refinement threads with the default configuration), with these
changes we see a noticable increase in the steady state green zone
value as compared to the baseline:

	baseline	modified
mean	387		437
median	390		445
stddev	 68		 67
min	121		167
max	568		575

across ~375 collection pauses for each case.

We're still using the same green zone adjustment (the first 40 or so
pauses show identical green_zone growth in this comparison).  The
difference is in the activation of the primary (zero'th) concurrent
refinement thread by dirty card queue set notification.  After a pause
we'll often see a burst of concurrent refinement thread activity, as
dirty cards scheduled for revisiting are processed.  Once that's done,
the modified version typically activates / runs / deactivates just the
primary thread as mutators enqueue buffers, keeping the number of
buffers close to the green zone target.  The baseline allows the
number of buffers to grow until several threads are activated (4 with
the default configuration used).  Sometimes the baseline starts them
too late (or not at all), allowing the number of buffers to
significantly exceed the green zone target when a pause occurs,
leading to the update_rs phase exceeding its time goal.

As a result of this change, ConcurrentG1Refine construction no longer
needs to predictor argument (though it may return with future
improvements to the control model as part of JDK-8137022).

- Command line -XX:G1ConcRefinementThreads=0 now creates zero
concurrent refinement threads, rather than using the ergonomic default
even though zero is explicitly specified.  This will result in
mutator-only concurrent processing of dirty card buffers, which may
result in missed pause targets.  (Mutator-only processing being
insufficient is one of the issues discussed in JDK-8137022.) The use
of a zero value is mostly intended for testing, rather than production
use.

- Command line -XX:G1ConcRefinementRedZone=0 is no longer documented
as disabling concurrent processing.  So far as I can tell, it never
did so.  Rather, it meant that buffers completed by mutator threads
were always processed by them (and that only when
G1UseAdaptiveConcRefinement was off).  Buffers enqueued for other
reasons would still be processed by the concurrent refinement threads.

CR:
https://bugs.openjdk.java.net/browse/JDK-8133051

Webrev:
http://cr.openjdk.java.net/~kbarrett/8133051/webrev.00/

Testing:
Local specjbb2015 (Linux-x64)
GC nightly with G1
Aurora performance testing - no significant differences.