RFR: 8317755: G1: Periodic GC interval should test for the last whole heap GC

Wed Oct 18 14:27:28 UTC 2023

On Mon, 9 Oct 2023 20:46:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See the description in the bug. Fortunately, we already track the last whole-heap GC. The new regression test verifies the behavior.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1 tier2 tier3`

Thanks for looking at it!

Re-reading JEP 346: “Promptly Return Unused Committed Memory from G1”...

In the scenario we are seeing, we do have lots of unused committed memory that would not be reclaimed promptly until concurrent cycle executes. The need for that cleanup is in worst case driven by heap connectivity changes that do not readily reflect in other observable heap metrics. A poster example would be a cache sitting perfectly in "oldgen", eventually dropping the entries, producing a huge garbage patch. We would only discover this after whole heap marking. In some cases, the tuning based on occupancy (e.g. soft max heap size, heap reserve, etc.) would help if we promote enough stuff to actually trigger the concurrent cycle. But if we keep churning very efficient young collections, we would get there very slowly.

Therefore, I’d argue the current behavior is against _the spirit_ of the JEP 346, even though _the letter_ says that we track “any” GC for periodic GC. There is no explicit mention why young GCs should actually be treated as recent GCs, even though — with the benefit of hindsight — they throw away promptness guarantees. Aside: Shenandoah periodic GC does not make any claims it would run only in idle phases; albeit the story is simpler without young GCs. But Generational Shenandoah would follow the same route: the periodic whole heap GC would start periodically regardless of young collections running in between or not.

The current use of “idle” is also awkward in JEP 346. If user enables `G1PeriodicGCInterval` without doing anything else, they would get a GC even when application is churning at 100% CPU, but without any recent GC in sight. I guess we can think about “idle” as “GC is idle”, but then arguably not figuring out the whole heap situation _for hours_ can be described as “GC is idle”. I think the much larger point of the JEP is to reclaim memory promptly, which in turn requires whole heap GC. Looks like JEP somewhat painted itself in the corner by considering all GCs, including young.

I doubt that users would mind if we change the behavior of `G1PeriodicGCInterval` like this: the option is explicitly opt-in, the configurations I see in prod are running with huge intervals, etc. So we are asking for a relatively rare concurrent GC even when application is doing young GCs. But I agree that departing from the current behavior might still have undesired consequences, for which we need to plan the escape route. There is also a need to have this in JDK 17u and 21u.

So as the compromise I am leaning towards introducing the flag what defines whether periodic GC checks for last whole heap collection or not. The default can still be the current behavior, but the behavior we want here is also easily achievable. This is implemented as new commit. (I realize this still requires CSR, being a product flag and all.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16107#issuecomment-1768580421