RFR: 8133051: Concurrent refinement threads may be activated and deactivated at random

Tue Apr 5 22:31:37 UTC 2016

Kim,

I saw that this review is on hold.  Just want to offer a suggestion (if 
it isn't
already there - I only had a cursory look).  In run_service() count the
number of buffers processed and report that number in the deactivate
logging.

Thanks.

Jon

On 4/4/2016 11:48 AM, Kim Barrett wrote:
> Please review this change to the G1 concurrent refinement thread
> controller.  This change addresses unnecessary activation when there
> are many threads and few buffers to be processed.  It also addresses
> delayed activation due to mis-configuration of the dirty card queue
> set's notification mechanism.
>
> This change continues to use (more or less) the existing control
> model, only avoiding obviously wasted effort or undesirable delays.
> Further enhancements to the control model will be made under
> JDK-8137022 or subtasks from that.
>
> - Changed the G1 concurrent refinement thread activation controller to
> use a minimum buffer count step between (de)activation values for the
> threads.  This is accomplished by having a minimum yellow zone size,
> based on the number of refinement threads.  This avoids waking up more
> refinement threads than there are buffers available to process.  (It
> is, of course, still possible for a refinement thread to wake up and
> discover it has no work to do, because of progress by other threads.
> But at least we're no longer waking up threads with a near guarantee
> they won't find work to do.)
>
> - As part of the above, changed G1ConcRefinementThresholdStep to have
> a minimum value of one, a default value of 2, and to be used to
> determine a lower bound on the thread activation step size.  A larger
> step size makes it less likely a thread will be woken up and discover
> other threads have already completed the work "allocated" to it.  Too
> large a minimum may overly restrict the number of refinement threads
> being activated, leading to missed pause targets.
>
> - Changed the threshold for activation of the primary concurrent
> refinement thread via notification from the dirty card queue set upon
> enqueue of a new buffer.  It was previously using a notification
> threshold of green_zone * (1 + predictor_sigma), rather than the
> "normal" activation threshold calculated using the green_zone value
> and threshold steps.  Using default configuration parameters, this
> could lead to a significantly larger activation threshold,
> particularly as the green_zone value grows, which could lead to a much
> larger number of pending buffers for pause-time update_rs to process,
> leading to missed update_rs time targets and unnecessary back pressure
> on the green_zone size.
>
> Comparing runs of specjbb2015 on Linux-x64 with 24 logical processors
> (so 18 refinement threads with the default configuration), with these
> changes we see a noticable increase in the steady state green zone
> value as compared to the baseline:
>
> 	baseline	modified
> mean	387		437
> median	390		445
> stddev	 68		 67
> min	121		167
> max	568		575
>
> across ~375 collection pauses for each case.
>
> We're still using the same green zone adjustment (the first 40 or so
> pauses show identical green_zone growth in this comparison).  The
> difference is in the activation of the primary (zero'th) concurrent
> refinement thread by dirty card queue set notification.  After a pause
> we'll often see a burst of concurrent refinement thread activity, as
> dirty cards scheduled for revisiting are processed.  Once that's done,
> the modified version typically activates / runs / deactivates just the
> primary thread as mutators enqueue buffers, keeping the number of
> buffers close to the green zone target.  The baseline allows the
> number of buffers to grow until several threads are activated (4 with
> the default configuration used).  Sometimes the baseline starts them
> too late (or not at all), allowing the number of buffers to
> significantly exceed the green zone target when a pause occurs,
> leading to the update_rs phase exceeding its time goal.
>
> As a result of this change, ConcurrentG1Refine construction no longer
> needs to predictor argument (though it may return with future
> improvements to the control model as part of JDK-8137022).
>
> - Command line -XX:G1ConcRefinementThreads=0 now creates zero
> concurrent refinement threads, rather than using the ergonomic default
> even though zero is explicitly specified.  This will result in
> mutator-only concurrent processing of dirty card buffers, which may
> result in missed pause targets.  (Mutator-only processing being
> insufficient is one of the issues discussed in JDK-8137022.) The use
> of a zero value is mostly intended for testing, rather than production
> use.
>
> - Command line -XX:G1ConcRefinementRedZone=0 is no longer documented
> as disabling concurrent processing.  So far as I can tell, it never
> did so.  Rather, it meant that buffers completed by mutator threads
> were always processed by them (and that only when
> G1UseAdaptiveConcRefinement was off).  Buffers enqueued for other
> reasons would still be processed by the concurrent refinement threads.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8133051
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8133051/webrev.00/
>
> Testing:
> Local specjbb2015 (Linux-x64)
> GC nightly with G1
> Aurora performance testing - no significant differences.
>