RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3]

Mon Sep 26 20:58:17 UTC 2022

On Mon, 26 Sep 2022 13:21:14 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> src/hotspot/share/gc/g1/g1Analytics.cpp line 251:
>> 
>>> 249: double G1Analytics::predict_dirtied_cards_in_thread_buffers() const {
>>> 250:   return predict_zero_bounded(_dirtied_cards_in_thread_buffers_seq);
>>> 251: }
>> 
>> I believe this sequence captures #dirty cards in thread buffers at GC pause start. However, I don't think it follows a normal distribution because of the buffer-size clamping. In comparison, the dirty-cards-generation-rate (`predict_dirtied_cards_rate_ms`) more likely follows a normal distribution.
>
> The number of dirty cards in thread buffers at the start of GC pause is
> exactly what this is supposed to capture. We discount the number of cards that
> can be processed in the budgeted time by this prediction to get the target
> number of cards in the queue. It's not a very accurate prediction, but it's
> still worth doing. For some applications and configurations I've tested (with
> low G1RSetUpdatingPauseTimePercent) it might be 5-10% of the target. A model
> based on the number of threads tends to do very poorly for some applications.
> 
> This is entirely different from predict_dirtied_cards_rate_ms, which is a
> different value and has different uses.

My reasoning is that #cards in the bounded thread-buffers doesn't necessarily follow a normal distribution, so one can't predict the future valuse using avg + std. Taking an extreme example of a single thread-buffer, if the population avg is ~buffer_capacity, #cards in the thread-buffer can exhibit large jumps btw 0 and ~buffer_capacity due to the implicit modulo operation.

> It's not a very accurate prediction, but it's
still worth doing.

Which benchmark shows its effect? I hard-coded `size_t predicted_thread_buffer_cards = 0;` in `G1Policy::record_young_collection_end` but can't see much difference. Normally, #cards in global dirty queue should be >> #cards in thread-local buffers.

> A zero value for the prediction indicates that we don't have a valid
prediction

Why not? It's still possible that the alloc-rate is zero after start-up; I mean alloc-rate is up to applications.

On a related note, there's special treatment for too-close upcoming GC pause later on, `if (_predicted_time_until_next_gc_ms > _update_period_ms) {`. Shouldn't there be sth similar for too-far upcoming GC pause? IOW, `incoming_rate * _predicted_time_until_next_gc_ms;` would be unreliable for farther prediction, right?

-------------

PR: https://git.openjdk.org/jdk/pull/10256