RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3]
Kim Barrett
kbarrett at openjdk.org
Tue Sep 27 22:51:23 UTC 2022
On Mon, 26 Sep 2022 20:54:37 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:
>> The number of dirty cards in thread buffers at the start of GC pause is
>> exactly what this is supposed to capture. We discount the number of cards that
>> can be processed in the budgeted time by this prediction to get the target
>> number of cards in the queue. It's not a very accurate prediction, but it's
>> still worth doing. For some applications and configurations I've tested (with
>> low G1RSetUpdatingPauseTimePercent) it might be 5-10% of the target. A model
>> based on the number of threads tends to do very poorly for some applications.
>>
>> This is entirely different from predict_dirtied_cards_rate_ms, which is a
>> different value and has different uses.
>
> My reasoning is that #cards in the bounded thread-buffers doesn't necessarily follow a normal distribution, so one can't predict the future valuse using avg + std. Taking an extreme example of a single thread-buffer, if the population avg is ~buffer_capacity, #cards in the thread-buffer can exhibit large jumps btw 0 and ~buffer_capacity due to the implicit modulo operation.
>
>> It's not a very accurate prediction, but it's
> still worth doing.
>
> Which benchmark shows its effect? I hard-coded `size_t predicted_thread_buffer_cards = 0;` in `G1Policy::record_young_collection_end` but can't see much difference. Normally, #cards in global dirty queue should be >> #cards in thread-local buffers.
Here's an example log line from my development machine:
[241.020s][debug][gc,ergo,refine ] GC(27) GC refinement: goal: 86449 + 8201 / 2.00ms, actual: 100607 / 2.32ms, HCC: 1024 / 0.00ms (exceeded goal)
Note the cards in thread buffers prediction (8149) is approaching 10% of the goal.
This is from specjbb2015 with
`-Xmx40g -XX:MaxGCPauseMillis=100 -XX:G1RSetUpdatingPauseTimePercent=2`
on a machine with 32 cores.
specjbb2015 with default pause time and refinement budget probably won't see
much impact from the cards still in buffers because the goal will be so much
larger. OTOH, such a configuration also probably does very little concurrent
refinement.
Lest one thinks that configuration is unreasonable or unlikely, part of the
point of this change is to improve the behavior with a smaller percentage of a
pause budgeted for refinement. That allows more time in a pause for other
things, like evacuation. (Even with that more restrictive condiguration
specjbb2015 still doesn't do much concurrent refinement. For example, during
the mutator phase before that GC there was never more than one refinement
thread running, and it was only running for about the last 5% of the phase.)
I'm using the prediction infrastructure to get a moving average over several
recent samples, to get a number that has some basis. The stdev implicit in
that infrastructure makes the result a bit higher than the average. I think
probably doesn't matter much, as none of the inputs nor the calculations that
use them are very precise. But the behavior does seem to be worse (in the
sense of more frequently blowing the associated budget and by larger amounts)
if this isn't accounted for to some extent.
But maybe your point is more about the stddev, and that should not be
included. I can see that, and could just use the moving average.
-------------
PR: https://git.openjdk.org/jdk/pull/10256
More information about the hotspot-dev
mailing list