RFR: 8317755: G1: Periodic GC interval should test for the last whole heap GC

Thu Oct 19 17:43:42 UTC 2023

On Wed, 18 Oct 2023 14:24:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the description in the bug. Fortunately, we already track the last whole-heap GC. The new regression test verifies the behavior.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1 tier2 tier3`
>
> Thanks for looking at it!
> 
> Re-reading JEP 346: “Promptly Return Unused Committed Memory from G1”...
> 
> In the scenario we are seeing, we do have lots of unused committed memory that would not be reclaimed promptly until concurrent cycle executes. The need for that cleanup is in worst case driven by heap connectivity changes that do not readily reflect in other observable heap metrics. A poster example would be a cache sitting perfectly in "oldgen", eventually dropping the entries, producing a huge garbage patch. We would only discover this after whole heap marking. In some cases, the tuning based on occupancy (e.g. soft max heap size, heap reserve, etc.) would help if we promote enough stuff to actually trigger the concurrent cycle. But if we keep churning very efficient young collections, we would get there very slowly.
> 
> Therefore, I’d argue the current behavior is against _the spirit_ of the JEP 346, even though _the letter_ says that we track “any” GC for periodic GC. There is no explicit mention why young GCs should actually be treated as recent GCs, even though — with the benefit of hindsight — they throw away promptness guarantees. Aside: Shenandoah periodic GC does not make any claims it would run only in idle phases; albeit the story is simpler without young GCs. But Generational Shenandoah would follow the same route: the periodic whole heap GC would start periodically regardless of young collections running in between or not.
> 
> The current use of “idle” is also awkward in JEP 346. If user enables `G1PeriodicGCInterval` without doing anything else, they would get a GC even when application is churning at 100% CPU, but without any recent GC in sight. I guess we can think about “idle” as “GC is idle”, but then arguably not figuring out the whole heap situation _for hours_ can be described as “GC is idle”. I think the much larger point of the JEP is to reclaim memory promptly, which in turn requires whole heap GC. Looks like JEP somewhat painted itself in the corner by considering all GCs, including young.
> 
> I doubt that users would mind if we change the behavior of `G1PeriodicGCInterval` like this: the option is explicitly opt-in, the configurations I see in prod are running with huge intervals, etc. So we are asking for a relatively rare concurrent GC even when application is doing young GCs. But I agree that departing from the current behavior might still have undesired consequences, for which we need to plan the escape route. There is also a need to ...

> @shipilev : I have not made up my mind about the other parts of your proposal, but:
> 
> > The current use of “idle” is also awkward in JEP 346. If user enables G1PeriodicGCInterval without doing anything else, they would get a GC even when application is churning at 100% CPU, but without any recent GC in sight.
> 
> This is the reason for the `G1PeriodicGCSystemLoadThreshold` option and is handled by the feature/JEP.

Yes, that is why I said "without doing anything else". With that example, I wanted to point out that the definition of "idle" is already quite murky even with current JEP, where we have an additional option to tell if "idle" includes the actual system load. In this view, having another option that would tell if "idle" includes young GCs fits well, I think.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16107#issuecomment-1771405208