RFR: 8357445: G1: Time-Based Heap Uncommit During Idle Periods [v4]

Thu Jul 31 21:15:59 UTC 2025

On Thu, 17 Jul 2025 07:32:10 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Release-note howto: https://openjdk.org/guide/index.html#release-notes. For now I tagged the issue with `release-note=yes`. Note that it can be written later.
>
> One reason why I do not see the need for these flags to be manageable is that this heap shrinking is a supportive function if the application is "idle". Main heap sizing should be done by the existing heap sizing/AHS at GC events.
> 
> From the CR text:
>>  Doing heap sizing based on garbage collections has a big disadvantage: if there are no garbage collections (due to no application activity) there is a risk that a large amount of heap is kept committed unnecessarily *for a long time*. 
> (emphasis mine)
> 
> If we are talking about "a long time", so there does not seem to be a need for changing it during runtime (or change it at all). It should not matter if that "long time" is "long time +- small epsilon", and so, allowing dynamic change of "a long time" to another "long time" seems unnecessary without a very good use case.
> 
> Please consider not necessarily the current situation, but with "full" AHS.
> 
> Another question is whether you had thoughts about the interaction with (JDK-8213198)[https://bugs.openjdk.org/browse/JDK-8213198], as this change seems to be a subset of the other. (That's just curiosity from me, I think this feature is useful as is, and if the other ever materializes, we can always reconsider).
> 
> Otoh decreasing the heap by this mechanism will eventually trigger a marking.

@tschatzl Thanks for the detailed feedback! 
Flag Changes (per your feedback):

- G1UseTimeBasedHeapSizing is now diagnostic and enabled by default (was experimental/disabled).
- G1MinRegionsToUncommit is now diagnostic (was experimental).
- Timing flags (G1UncommitDelayMillis, G1TimeBasedEvaluationIntervalMillis) remain manageable to support operational use cases.

## Manageable Flag Use Cases
I can see 3 scenarios where runtime adjustment is valuable:

**High-Availability Services**
- 24/7 operations cannot restart for tuning adjustments  
- Memory pressure events require immediate response
- Cost optimization demands dynamic resource adaptation

**Cloud & Container Platforms**  
- Resource limits change dynamically (auto-scaling)
- Multi-tenancy requires per-workload optimization
- Cost efficiency drives aggressive memory reclaim

**DevOps & SRE Teams**
- Incident response needs immediate memory reclaim
- Performance testing requires runtime comparison of settings  
- Capacity planning benefits from live tuning experiments

## "Long Time" Consideration
While the feature targets 'long idle periods,' production shows varied patterns where the difference between 5 minutes vs 30 seconds becomes critical - especially in container environments where exceeding memory limits means process termination, not just performance degradation.

## JDK-8213198 Interaction  
After reviewing that issue, I see they address **orthogonal problems**:

**JDK-8213198**: Active application, young GCs happening, needs mixed GCs for string table cleanup
**JDK-8357445**: Idle application, no GCs happening, needs memory uncommit

**Operational States:**
- String table issue: Active allocation + insufficient mixed GCs  
- Time-based uncommit: Complete inactivity + no allocations

**Complementary Solutions:**
- JDK-8213198: Triggers concurrent cycles when string table grows
- JDK-8357445: Uncommits memory during idle periods
- Future full AHS: Would orchestrate both mechanisms

The manageable flags become critical in the idle scenario where container memory limits create immediate pressure, unlike the string table scenario where growth can be tolerated for longer periods.

Would this clarification help with the flag classification decision?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2246389289