RFR: 8133051: Concurrent refinement threads may be activated and deactivated at random

Thu Apr 14 21:38:40 UTC 2016

> On Apr 14, 2016, at 3:46 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> 
> Kim,
> 
> http://cr.openjdk.java.net/~kbarrett/8133051/webrev.01/src/share/vm/gc/g1/concurrentG1Refine.cpp.frames.html
> 
> 296 static size_t calc_new_green_zone(size_t green,
> 297 double update_rs_time,
> 298 size_t update_rs_processed_buffers,
> 299 double goal_ms) {
> 300 // Adjust green zone based on whether we're meeting the time goal.
> 301 // Limit to max_green_zone.
> 302 const double inc_k = 1.1, dec_k = 0.9;
> 303 if (update_rs_time > goal_ms) {
> 304 if (green > 0) {
> 305 green = static_cast<size_t>(green * dec_k);
> 306 }
> 
> 
> If you're not achieving the goal and green is 0, it stays
> stuck at 0.  If update_rs_time is too long, don't we want
> more concurrent refinement?

The green_zone value is the target number of buffers pending at the
start of the GC pause.  The assumption is that the yellow and red
zones are configured so that we're (usually) close to that target.  We
then adjust the green_zone value based on whether update_rs met its
time goal.  There may be additional buffers added between pause start
and update_rs (partial buffers from Java threads, remset entries
enqueued as part of eager reclaim of humongous objects setup, ...).

The current green_zone update calculation doesn't take into account
how successful concurrent refinement was in achieving the green_zone
target, and the impact that has on update_rs time.  It also only uses
the sign bit of the error between actual vs goal update_rs time.  And
it uses the error between the actual and the *next* goal time.  And
the goal time is calculated using the last hot card cache update time
rather than a predicted next hot card cache update time.  And so on.
So yes, there are *many* factors the current green_zone update
calculation isn't taking into account, some of which may be
important.  For this change set, I'm not trying to address this issue.

For the specific question of what happens when the green_zone value
gets driven to (near) zero, yes, it could stay stuck at zero.  If it
does so, the present yellow and red zone calculations (which are based
on the green_zone value) will produce minimum values so that the
refinement thread activity will be maximized, subject to actual buffer
availability.  And if that isn't keeping up then mutator-invoked
refinement will also get involved.  One of the things this change set
does do is reduce wasted refinement thread activation when green_zone
is small, where rounding errors could result in more threads being
activated than there were buffers to process.  That's the purpose of
the minimum threshold step / minimum yellow zone size.

So in answer to your question

  If update_rs_time is too long, don't we want more concurrent
  refinement?

Yes, and the current control laws result in that.

> 101 if (worker_i == 0) {
> 102 // Potentially activate worker 0 more aggressively, to keep
> 103 // available buffers near green_zone value. When yellow_size is
> 104 // large we don't want to allow a full step to accumulate before
> 105 // doing any processing, as that might lead to significantly more
> 106 // than green_zone buffers to be processed by update_rs.
> 107 step = MIN2(step, ParallelGCThreads / 2.0);
> 108 }
> 
> Line 107 says the more GC threads I have, the bigger the step I need to
> start the first one.  Is that right?

One thing to remember is that ParallelGCThreads is the number of
parallel worker threads available to do work during a pause, e.g. the
pause-time update_rs phase has this many threads available to do the
work.  This number is only very weakly connected to the number of
concurrent refinement threads (the present default for the latter
happens to be ParallelGCThreads, but that could change).

The idea behind

107 step = MIN2(step, ParallelGCThreads / 2.0);

is that if step is large, we want to more aggressively activate the
so-called "primary" thread (which is responsible for keeping us close
to the green_zone number of buffers when there aren't so many that
other threads are needed) than we would if we based its activation
only on the normal step value.  That way, once we get close to the
green_zone value, we stay close.

We want some hysteresis in the activation of the primary thread.  It's
more expensive to have it wake up, process one buffer, and resume
waiting N times than to wake up, process N buffers, and resume
waiting, so we want to wait until there are some number N buffers over
the green_zone available for processing before waking up the primary
thread.

But we don't want N to be so large that if we have approached that
overage when a pause occurs, that it will push us significantly over
the update_rs budget.  We base N on ParallelGCThreads because any
overage will be distributed among the parallel worker threads at
update_rs time.

So long as the configuration is not so bad that green_zone is driven
to zero and we still can't achieve the update_rs goal, the feedback
loop for green_zone update effectively takes whatever overage we
permit into account; larger permitted overage applies more back
pressure on the long-term green_zone value.

So a primary thread activation step based on a multiple of
ParallelGCThreads applies a roughly constant (based on that
multiplier) back pressure on the green_zone, while providing some
desired hysteresis in the primary thread's activation.

The specific factor of 1/2 was plucked out of the air as a seemingly
plausible value.