RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal
Stefan Johansson
sjohanss at openjdk.org
Wed Sep 21 10:45:47 UTC 2022
On Wed, 14 Sep 2022 00:36:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal
> 8155996: Improve concurrent refinement green zone control
> 8134303: Introduce -XX:-G1UseConcRefinement
>
> Please review this change to the control of concurrent refinement.
>
> This new controller takes a different approach to the problem, addressing a
> number of issues.
>
> The old controller used a multiple of the target number of cards to determine
> the range over which increasing numbers of refinement threads should be
> activated, and finally activating mutator refinement. This has a variety of
> problems. It doesn't account for the processing rate, the rate of new dirty
> cards, or the time available to perform the processing. This often leads to
> unnecessary spikes in the number of running refinement threads. It also tends
> to drive the pending number to the target quickly and keep it there, removing
> the benefit from having pending dirty cards filter out new cards for nearby
> writes. It can't delay and leave excess cards in the queue because it could
> be a long time before another buffer is enqueued.
>
> The old controller was triggered by mutator threads enqueing card buffers,
> when the number of cards in the queue exceeded a threshold near the target.
> This required a complex activation protocol between the mutators and the
> refinement threads.
>
> With the new controller there is a primary refinement thread that periodically
> estimates how many refinement threads need to be running to reach the target
> in time for the next GC, along with whether to also activate mutator
> refinement. If the primary thread stops running because it isn't currently
> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any
> involvement in the activation of refinement threads by mutator threads.
>
> The estimate of how many refinement threads are needed uses a prediction of
> time until the next GC, the number of buffered cards, the predicted rate of
> new dirty cards, and the predicted refinement rate. The number of running
> threads is adjusted based on these periodically performed estimates.
>
> This new approach allows more dirty cards to be left in the queue until late
> in the mutator phase, typically reducing the rate of new dirty cards, which
> reduces the amount of concurrent refinement work needed.
>
> It also smooths out the number of running refinement threads, eliminating the
> unnecessarily large spikes that are common with the old method. One benefit
> is that the number of refinement threads (lazily) allocated is often much
> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem
> described in JDK-8153225.)
>
> This change also provides a new method for calculating for the number of dirty
> cards that should be pending at the start of a GC. While this calculation is
> conceptually distinct from the thread control, the two were significanly
> intertwined in the old controller. Changing this calculation separately and
> first would have changed the behavior of the old controller in ways that might
> have introduced regressions. Changing it after the thread control was changed
> would have made it more difficult to test and measure the thread control in a
> desirable configuration.
>
> The old calculation had various problems that are described in JDK-8155996.
> In particular, it can get more or less stuck at low values, and is slow to
> respond to changes.
>
> The old controller provided a number of product options, none of which were
> very useful for real applications, and none of which are very applicable to
> the new controller. All of these are being obsoleted.
>
> -XX:-G1UseAdaptiveConcRefinement
> -XX:G1ConcRefinementGreenZone=<buffer-count>
> -XX:G1ConcRefinementYellowZone=<buffer-count>
> -XX:G1ConcRefinementRedZone=<buffer-count>
> -XX:G1ConcRefinementThresholdStep=<buffer-count>
>
> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed
> value for the target number of cards, though it is poorly named for that.
>
> A configuration that was useful for some kinds of debugging and testing was to
> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a
> very large value, effectively disabling concurrent refinement. To support
> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic
> option has been added (see JDK-8155996).
>
> The other options are meaningless for the new controller.
>
> Because of these option changes, a CSR and a release note need to accompany
> this change.
>
> Testing:
> mach5 tier1-6
> various performance tests.
> local (linux-x64) tier1 with -XX:-G1UseConcRefinement
>
> Performance testing found no regressions, but also little or no improvement
> with default options, which was expected. With default options most of our
> performance tests do very little concurrent refinement. And even for those
> that do, while the old controller had a number of problems, the impact of
> those problems is small and hard to measure for most applications.
>
> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare
> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with
> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options
> held constant) showed a statistically significant improvement of about 4.5%
> for critical-jOPS. Using the changed controller, the difference between this
> configuration and the default is fairly small, while the baseline shows
> significant degradation with the more restrictive options.
>
> For all tests and configurations the new controller often creates many fewer
> refinement threads.
Took a first pass over the code and I really like the new approach. Feels much simpler to reason about and understand.
I have a few small suggestions/questions.
src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 267:
> 265: // If target is "unbounded" then wait forever (until explicitly
> 266: // activated). This happens during startup, when we don't bother with
> 267: // refinement.
We use `SIZE_MAX` to signal that the target is "unbounded", like the comment says. We do this comparison a few times and I think it would make sense to define a constant with a more descriptive name for this.
src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 174:
> 172: if (!try_refinement_step(cr()->pending_cards_target())) {
> 173: // Proceed with fewer threads if target reached.
> 174: cr()->reduce_threads_wanted();
Not sure I see the reason to never try refinement when adjusted. I get that there is a case where we don't need to do refinement but that seems to be handled by an "early return" in `try_refinement_step(...)`.
If we really want to skip refinement I would prefer an early return here as well.
if (cr()->adjust_threads_periodically()) {
// If adjustment done, don't do any refinement this round, since this thread
// might no longer be wanted active.
return;
}
...
src/hotspot/share/gc/g1/g1ConcurrentRefineThreadsNeeded.hpp line 40:
> 38: G1Policy* _policy;
> 39: double _update_period_ms;
> 40: double _predicted_time_ms;
I like to keep the names short but I wonder if it would make sense to include until gc here: `_predicted_time_until_gc_ms`
src/hotspot/share/gc/g1/g1Policy.cpp line 1124:
> 1122: uint remaining = target - MIN2(target, full);
> 1123: size_t bytes = remaining * HeapRegion::GrainBytes;
> 1124: return bytes - MIN2(bytes, allocator->used_in_alloc_regions());
>From what I can tell it's only this last line that require the `Heap_lock` to be held, so we could still do an estimate (slightly worse) even if we fail to take the lock. So my question is would it make sense to move the `Heap_lock->try_lock()` in here and just skip including `allocator->used_in_alloc_regions()` in the calculation when not getting the lock?
-------------
Changes requested by sjohanss (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10256
More information about the hotspot-dev
mailing list