RFR: 8317755: G1: Periodic GC interval should test for the last whole heap GC [v2]

Wed Nov 8 13:51:03 UTC 2023

On Wed, 18 Oct 2023 14:27:26 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the description in the bug. Fortunately, we already track the last whole-heap GC. The new regression test verifies the behavior.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1 tier2 tier3`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8317755-g1-periodic-whole
>  - Optional flag
>  - Merge branch 'master' into JDK-8317755-g1-periodic-whole
>  - Keep the cast
>  - Fix

Sorry for the late reply, I have been busy with getting the 423 JEP out the door :)

> _Mailing list message from [Kirk Pepperdine](mailto:kirk at kodewerk.com) on [hotspot-gc-dev](mailto:hotspot-gc-dev at mail.openjdk.org):_
> 
> Don?t we have already have a GCInterval with a default value of Long.MAX_VALUE?

There is no such flag.

>When I hear the word idle I immediately start thinking, CPU idle. In this case however, I quickly shifted to memory idle which I think translates nicely into how idle are the allocators. Thus basing heap sizing ergonomics on allocation rates seems like a reasonable metric until you consider the edge cases. The most significant ?edge case? IME is when GC overheads start exceeding 20%. In those cases GC will throttle allocations and that in turn cases ergonomics to reduce sizes instead of increasing them (to reduce GC overhead). My conclusion from this is that ergonomics should consider both allocation rates and GC overhead when deciding on how to resize heap at the end of a collection. 

Allocation rate directly impacts GC overhead by causing more GCs.

> Fortunately there are a steady stream of GC events that create a convenient point in time to make an ergonomic adjustment. Not having allocations and as a result not having the collector run implies one has to manufacture a convenient point in time to make an ergonomic sizing decision. 

See [JDK-8238687](https://bugs.openjdk.org/browse/JDK-8238687).

That CR and this proposed feature complement each other, GC should do something when there is no steady stream of gc events.

> [...]
> As for returning memory, two issues, there appears to be no definition for ?unused memory?. Secondly, what I can say after looking at 1000s of GC logs is that the amount of floating garbage that G1 leaves behind even after several concurrent cycles is not insignificant.

G1 currently respects `Min/MaxHeapFreeRatio` which is tunable. They may be fairly lenient.

>I also write a G1 heap fragmentation viewer and what it revealed is that heap remains highly fragmented and scattered after each GC cycle. All this suggests that heap will need to be compacted with a full collection in order to return a significantly large enough block of memory to make the entire effort worthwhile. 

There is no harm with fragmentation on a region level; G1 will give back memory on a region level anyway if able to do so (see `Min/MaxHeapFreeRatio` default values). There is now a humongous-object compacting full gc stage if necessary, so keeping region-level fragmentation in check is less of an issue now.

>Again, if the application is idle, then no harm no foul. However, for those applications that are memory-idle but not CPU-idle this might not be a great course of action.

Hence the flag described in the JEP. The reason why the defaults disable the CPU-idle check is that there is no good default.

> 
> In my mind, any trigger for a speculative collection would need to take into consideration, allocation rates, GC overhead, and mutator busyness (for cases when GC and allocator activity is low to 0).
> 

It does (if told to do so appropriately).

>From @shipilev :
>>    This is corroborated by experience from me working with end users actually doing that (i.e. force a whole heap analysis externally).
>
>Yes, and that is the beauty of periodic GCs that are driven by GC itself, when it knows it should not start the periodic GCs if there was recent activity. That part if captured in JEP, I think. What this PR does, is extends the usefulness of periodic GCs to the cases where you need a whole heap analysis to make the best call about what to do next. In G1 case, maybe discovering that we need to start running mixed collections to unclutter the heap.

I think there is still a misconception with the current "periodic gcs" here: it is first and foremost thought to be applicable for catching the idle case with all its options (also consider cpu idle, do full or concurrent gc).

This is the reason for the suggestion to phase out the current option `G1PeriodicGCInterval` to use a different name, and add this functionality (regular whole heap analysis to clean out cruft) as a new option (something like `WholeHeapAnalysisInterval`, or something more aligned with other collectors)

I could easily imagine the use case where one would want to act fairly quickly (and allowing a disruptive full gc) on the detected idle case ("shortly" after detecting idle), and be more relaxed about general cleanup (long period, being least disruptive using a concurrent mark).

Thomas

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16107#issuecomment-1801926929