RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3]
Thomas Schatzl
tschatzl at openjdk.org
Tue Sep 27 12:48:24 UTC 2022
On Fri, 23 Sep 2022 06:39:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal
>> 8155996: Improve concurrent refinement green zone control
>> 8134303: Introduce -XX:-G1UseConcRefinement
>>
>> Please review this change to the control of concurrent refinement.
>>
>> This new controller takes a different approach to the problem, addressing a
>> number of issues.
>>
>> The old controller used a multiple of the target number of cards to determine
>> the range over which increasing numbers of refinement threads should be
>> activated, and finally activating mutator refinement. This has a variety of
>> problems. It doesn't account for the processing rate, the rate of new dirty
>> cards, or the time available to perform the processing. This often leads to
>> unnecessary spikes in the number of running refinement threads. It also tends
>> to drive the pending number to the target quickly and keep it there, removing
>> the benefit from having pending dirty cards filter out new cards for nearby
>> writes. It can't delay and leave excess cards in the queue because it could
>> be a long time before another buffer is enqueued.
>>
>> The old controller was triggered by mutator threads enqueing card buffers,
>> when the number of cards in the queue exceeded a threshold near the target.
>> This required a complex activation protocol between the mutators and the
>> refinement threads.
>>
>> With the new controller there is a primary refinement thread that periodically
>> estimates how many refinement threads need to be running to reach the target
>> in time for the next GC, along with whether to also activate mutator
>> refinement. If the primary thread stops running because it isn't currently
>> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any
>> involvement in the activation of refinement threads by mutator threads.
>>
>> The estimate of how many refinement threads are needed uses a prediction of
>> time until the next GC, the number of buffered cards, the predicted rate of
>> new dirty cards, and the predicted refinement rate. The number of running
>> threads is adjusted based on these periodically performed estimates.
>>
>> This new approach allows more dirty cards to be left in the queue until late
>> in the mutator phase, typically reducing the rate of new dirty cards, which
>> reduces the amount of concurrent refinement work needed.
>>
>> It also smooths out the number of running refinement threads, eliminating the
>> unnecessarily large spikes that are common with the old method. One benefit
>> is that the number of refinement threads (lazily) allocated is often much
>> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem
>> described in JDK-8153225.)
>>
>> This change also provides a new method for calculating for the number of dirty
>> cards that should be pending at the start of a GC. While this calculation is
>> conceptually distinct from the thread control, the two were significanly
>> intertwined in the old controller. Changing this calculation separately and
>> first would have changed the behavior of the old controller in ways that might
>> have introduced regressions. Changing it after the thread control was changed
>> would have made it more difficult to test and measure the thread control in a
>> desirable configuration.
>>
>> The old calculation had various problems that are described in JDK-8155996.
>> In particular, it can get more or less stuck at low values, and is slow to
>> respond to changes.
>>
>> The old controller provided a number of product options, none of which were
>> very useful for real applications, and none of which are very applicable to
>> the new controller. All of these are being obsoleted.
>>
>> -XX:-G1UseAdaptiveConcRefinement
>> -XX:G1ConcRefinementGreenZone=<buffer-count>
>> -XX:G1ConcRefinementYellowZone=<buffer-count>
>> -XX:G1ConcRefinementRedZone=<buffer-count>
>> -XX:G1ConcRefinementThresholdStep=<buffer-count>
>>
>> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed
>> value for the target number of cards, though it is poorly named for that.
>>
>> A configuration that was useful for some kinds of debugging and testing was to
>> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a
>> very large value, effectively disabling concurrent refinement. To support
>> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic
>> option has been added (see JDK-8155996).
>>
>> The other options are meaningless for the new controller.
>>
>> Because of these option changes, a CSR and a release note need to accompany
>> this change.
>>
>> Testing:
>> mach5 tier1-6
>> various performance tests.
>> local (linux-x64) tier1 with -XX:-G1UseConcRefinement
>>
>> Performance testing found no regressions, but also little or no improvement
>> with default options, which was expected. With default options most of our
>> performance tests do very little concurrent refinement. And even for those
>> that do, while the old controller had a number of problems, the impact of
>> those problems is small and hard to measure for most applications.
>>
>> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare
>> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with
>> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options
>> held constant) showed a statistically significant improvement of about 4.5%
>> for critical-jOPS. Using the changed controller, the difference between this
>> configuration and the default is fairly small, while the baseline shows
>> significant degradation with the more restrictive options.
>>
>> For all tests and configurations the new controller often creates many fewer
>> refinement threads.
>
> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision:
>
> - wanted vs needed nomenclature
> - remove several spurious "scan"
> - delay => wait_time_ms
Some typos. Going to do some testing.
src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 77:
> 75: } else {
> 76: delete t;
> 77: }
Not sure this early-bail out is necessary; C++ seems to define `nullptr` values passed to `delete` well enough (https://en.cppreference.com/w/cpp/language/delete) to skip this complication.
Feel free to ignore.
src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 219:
> 217: // Deduct predicted cards in thread buffers to get target.
> 218: size_t new_target = budget - MIN2(budget, predicted_thread_buffer_cards);
> 219: // Add some hysterisis with previous values.
Suggestion:
// Add some hysteresis with previous values.
src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 233:
> 231: size_t predicted_thread_buffer_cards,
> 232: double goal_ms) {
> 233: if (!G1UseConcRefinement) return;
I would prefer either braces or an additional newline after this statement. When initially reading this I thought there were some indentation error.
src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 198:
> 196: void reduce_threads_wanted();
> 197:
> 198: // Test whethre the thread designated by worker_id should be active.
s/whethre/whether
src/hotspot/share/gc/g1/g1ConcurrentRefineThreadsNeeded.cpp line 44:
> 42: // Estimate how many concurrent refinement threads we need to run to achieve
> 43: // the target number of card by the time the next GC happens. There are
> 44: // several additional desirements we'd like to achieve while meeting that
According to https://en.wiktionary.org/wiki/desirement this is defined as
Something that is [desired](https://en.wiktionary.org/wiki/desired), but not absolutely required.
(This word was new to me).
I think that "desirements" already includes the phrase "we'd like to achieve". I.e. something that is desired, but is not absolutely required is already something "we'd like to achieve".
So I would like to suggest to either remove the phrase "we'd like to achieve" or reformulate the sentence as "... several additional/secondary goals we would like to achieve while meeting that (main) goal".
src/hotspot/share/gc/g1/g1ConcurrentRefineThreadsNeeded.cpp line 79:
> 77: // Estimate number of cards that need to be processed before next GC. There
> 78: // are no incoming cards when time is short, because the controller activates
> 79: // refinement by mutator threads when there to a GC, to stay on target even
"refinement by mutator threads when there to a GC" I do not understand this part of the sentence, what does "there to a gc" mean?
Maybe this should means:
"..., because in this case the controller activates refinement by mutator threads to stay on target even..."
src/hotspot/share/gc/g1/g1ConcurrentRefineThreadsNeeded.cpp line 116:
> 114: // excess cards to process. Just one thread might not be sufficient, but
> 115: // we don't have any idea how many we actually need. Eventually the
> 116: // prediction machinary will warm up and we'll be able to get estimates.
s/machinary/machinery
-------------
Changes requested by tschatzl (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10256
More information about the hotspot-dev
mailing list