RFR: 8357445: G1: Time-Based Heap Uncommit During Idle Periods [v7]
Thomas Schatzl
tschatzl at openjdk.org
Tue Sep 2 10:09:58 UTC 2025
On Fri, 1 Aug 2025 05:53:48 GMT, Monica Beckwith <mbeckwit at openjdk.org> wrote:
>> **Implements:** https://bugs.openjdk.org/browse/JDK-8357445
>>
>> Implement time-based heap uncommit for G1 during idle periods.
>>
>> Key changes:
>> - Added G1HeapEvaluationTask for periodic heap evaluation
>> - Switch from G1ServiceTask to PeriodicTask for improved scheduling
>> - Implemented time-based heap sizing policy with configurable uncommit delay
>> - Added region activity tracking with last access timestamps
>> - Integrated VM_G1ShrinkHeap operation for safe heap shrinking
>> - Added new G1 flags: G1UseTimeBasedHeapSizing, G1TimeBasedEvaluationIntervalMillis, G1UncommitDelayMillis, G1MinRegionsToUncommit
>> - Added 'sizing' log tag for heap sizing operations
>>
>> Comprehensive Test Coverage:
>> - Enhanced TestG1RegionUncommit: minimum heap boundaries, concurrent allocation/uncommit scenarios
>> - Enhanced TestTimeBasedHeapSizing: humongous object handling, rapid allocation cycles, edge cases
>> - Enhanced TestTimeBasedRegionTracking: concurrent region access, lifecycle transition validation
>> - Enhanced TestTimeBasedHeapConfig: parameter boundary values, small heap configurations
>>
>> This ensures time-based heap uncommit works correctly while maintaining all safety guarantees and test expectations.
>
> Monica Beckwith has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove unused static _uncommit_delay member and accessor
I did not look through the tests yet, there are quite a few areas that need fixes first.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1075:
> 1073:
> 1074: log_debug(gc, ergo, heap)("Heap resize. Requested shrinking amount: %zuB actual shrinking amount: %zuB (%u regions)",
> 1075: shrink_bytes, shrunk_bytes, num_regions_removed);
Suggestion:
shrink_bytes, shrunk_bytes, num_regions_removed);
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1138:
> 1136: return true; // We did something.
> 1137: }
> 1138:
Where is that path actually taken? This seems very dangerous because the code apparently does not synchronize with GC operations, and this may be true if this thread is somehow activated during a GC in progress.
A VM GC operation might start just before above check.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1510:
> 1508: } else {
> 1509: assert(_heap_evaluation_task == nullptr, "pre-condition");
> 1510: }
If the flag is manageable, it changes, so we can't not initialize the task at startup.
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 37:
> 35:
> 36: G1HeapEvaluationTask::G1HeapEvaluationTask(G1CollectedHeap* g1h, G1HeapSizingPolicy* heap_sizing_policy) :
> 37: PeriodicTask(G1TimeBasedEvaluationIntervalMillis), // Use PeriodicTask with interval
Periodic tasks do not have a flexible interval. So making `G1TimeBasedEvaluationIntervalMillis` manageable and changing it will have no effect (and `PeriodicTask` does not support changing intervals).
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 48:
> 46: if (!G1UseTimeBasedHeapSizing) {
> 47: return;
> 48: }
The task should not start if disabled. (Assuming not manageable)
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 54:
> 52: log_trace(gc, sizing)("GC active, skipping uncommit evaluation");
> 53: return;
> 54: }
This is sketchy. Unless you manually sync, there is no guarantee this is valid. Also since the code is running in a separate thread that might not be stopped during GC, the GC VM operation might just start after this check.
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 56:
> 54: }
> 55:
> 56: ResourceMark rm; // Ensure temporary resources are released
`evaluate_heap_resize()` already does that internally. Is this one necessary?
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 66:
> 64: log_warning(gc, sizing)("Uncommit evaluation: unexpected expansion request ignored (resize_amount=%zuB)", resize_amount);
> 65: // This should not happen since uncommit-based policy only handles shrinking
> 66: assert(false, "Uncommit-based heap sizing should never request expansion");
Just fail? Not sure if `evaluate_heap_resize` should have that parameter at all.
src/hotspot/share/gc/g1/g1HeapEvaluationTask.cpp line 68:
> 66: assert(false, "Uncommit-based heap sizing should never request expansion");
> 67: } else {
> 68: log_info(gc, sizing)("Uncommit evaluation: shrinking heap by %zuMB", resize_amount / M);
I am not convinced about all these `log_info` level messages. With `-Xlog:gc*` which is kind of one would use by default, one may get an awful lot of these unnecessarily (below limit to do that only every 10th attempt indicates that to me at least).
src/hotspot/share/gc/g1/g1HeapRegion.cpp line 253:
> 251: _age_index(G1SurvRateGroup::InvalidAgeIndex),
> 252: _node_index(G1NUMA::UnknownNodeIndex),
> 253: _last_access_timestamp(Ticks::now()), // Initialize timestamp with current time
Since `G1HeapRegion::initialize()` is called on this region just a bit later, I recommend not wasting the time to call `Ticks::now()` for no gain. Just initialize to a high value(?) and things should work out I think if the code ever comes across such a badly initialized method.
src/hotspot/share/gc/g1/g1HeapRegion.hpp line 76:
> 74: class G1HeapRegion : public CHeapObj<mtGC> {
> 75: friend class VMStructs;
> 76: friend class G1Allocator; // For access to record_activity()
`G1Allocator` does not access `record_activity()` at all?
src/hotspot/share/gc/g1/g1HeapRegion.hpp line 257:
> 255: uint _node_index;
> 256:
> 257: // Time-based heap sizing: tracks last allocation/access time
Suggestion:
// Time-based heap sizing: tracks last allocation/access time.
src/hotspot/share/gc/g1/g1HeapRegion.hpp line 561:
> 559: void set_node_index(uint node_index) { _node_index = node_index; }
> 560:
> 561: // Time-based heap sizing methods
Suggestion:
// Time-based heap sizing methods.
src/hotspot/share/gc/g1/g1HeapRegion.hpp line 564:
> 562: void record_activity() {
> 563: _last_access_timestamp = Ticks::now();
> 564: }
Does not seem to be used outside `G1HeapRegion`. Remove, or better just inline since it only seems to be used once. Or put the implementation into the cpp file (and make `private`).
src/hotspot/share/gc/g1/g1HeapRegion.hpp line 578:
> 576: Tickspan elapsed = current_time - _last_access_timestamp;
> 577: return elapsed > delay;
> 578: }
Does not seem to be used? In general, I would tend to avoid putting policy stuff into `G1HeapRegion` which should just be a container for region state members. Also, this method is too complex imho to put into the `.hpp` file directly (should be in `.inline.hpp`), but since this method should be removed anyway....
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 454:
> 452: class UncommitCandidatesClosure : public G1HeapRegionClosure {
> 453: GrowableArray<G1HeapRegion*>* _candidates;
> 454: uint* _inactive_regions;
Taking a reference here (`uint& _inactive_regions`) might avoid the manual dereferences. But this value seems to be the same as the current number of elements in the `_candidates` array? Seems superfluous.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 456:
> 454: uint* _inactive_regions;
> 455: const G1HeapSizingPolicy* _policy;
> 456: public:
Suggestion:
public:
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 465:
> 463:
> 464: virtual bool do_heap_region(G1HeapRegion* r) {
> 465: if (r->is_empty() && _policy->should_uncommit_region(r)) {
This should be `r->is_free()`. Any region may be empty (like a pinned old region that only has references from native code), but not `Free`.
I think all uses of `is_empty()` are wrong in this change.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 483:
> 481: log_debug(gc, sizing)("Full region scan: found %u inactive regions out of %u total regions",
> 482: inactive_regions,
> 483: _g1h->max_num_regions());
Note that "inactive" is already used in region management - regions that G1 only queued for actual uncommit. I do not think it is necessary to use this adjective at all in the messages, so no need to think of something else here.
Maybe something like "idle-free"/"idle" or so. Should also integrate with the `G1HeapRegionPrinter` (otoh since we are not guaranteed to remove exactly these regions, probably it's better to not do that).
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 488:
> 486: bool G1HeapSizingPolicy::should_uncommit_region(G1HeapRegion* hr) const {
> 487: // Note: Caller already guarantees hr->is_empty() is true
> 488: // Empty regions should always be free and not in collection set in normal operation
Sentences should end with a full stop in comments (I am stopping commenting on this here, but there are quite a few of those).
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 507:
> 505: }
> 506:
> 507: size_t G1HeapSizingPolicy::evaluate_heap_resize(bool& expand) {
Should be renamed to better indicate that this is about getting idle free regions.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 517:
> 515: if (_g1h->is_stw_gc_active()) {
> 516: return 0;
> 517: }
This check seems superfluous. The code takes the heap lock anyway which the GC also needs to take. The only caller also already does that. And as mentioned there, this check is not sufficient to guarantee what you want.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 521:
> 519: // Must hold Heap_lock during heap resizing
> 520: MutexLocker ml(Heap_lock);
> 521:
Since this can be entered right after a garbage collection, maybe the evaluation should be immediately exited in that case? Garbage collection already found the "right" heap size, no need for idle calculation.
Otoh, `get_uncommit_candidates()` should find nothing in this case.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 526:
> 524: // Find regions eligible for uncommit
> 525: GrowableArray<G1HeapRegion*> candidates;
> 526: get_uncommit_candidates(&candidates);
There does not seem to be a point in collecting the actual regions in that array at all as far as I can tell. It's not used further down the code.
If we do not do that, it should be removed.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 531:
> 529: uint total_regions = _g1h->max_num_regions();
> 530:
> 531: // Need minimum number of inactive regions to proceed
Please call them "idle-free"/"idle" or something else due to the concerns mentioned above. This needs to be changed throughout.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 535:
> 533: size_t region_size = G1HeapRegion::GrainBytes;
> 534: size_t current_heap = _g1h->capacity();
> 535: size_t min_heap = MAX2((size_t)InitialHeapSize, MinHeapSize); // Never go below initial size
`InitialHeapSize` is already a `size_t`.
src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 550:
> 548: // 1. No more than 25% of inactive regions
> 549: // 2. No more than 10% of total committed regions
> 550: // 3. No more than max_shrink_bytes worth of regions
This policy sounds like `Min/MaxHeapFreeRatio` that are known to be not useful. Do not have a better idea right now either. Maybe look at what ZGC did at some point.
src/hotspot/share/gc/g1/g1VMOperations.cpp line 175:
> 173:
> 174: void VM_G1ShrinkHeap::doit() {
> 175: _g1h->shrink(_bytes);
I strongly believe that the shrink amount needs to be re-evaluated in the shrink pause again (ideally it would not need a VM operation, but I think this is not necessary for now). Between posting the VM operation and actual execution there can be any number of more (GC) VM operations that can change the shape of the heap, so the amount of necessary shrinking can change substantially.
Also, there needs to be some synchronization between the selection of regions made during determining that we should do this and the actual shrinking.
There is the risk that the region selection bases its decision based on e.g. region A, B, C being too old, and then uncommits regions D, E, and F here. Next time the idleness will be evaluated, regions A, B, and C will be found again as too old, so keeping on shrinking until eventually A, B, and C are uncommitted.
However the original goal of the method would have been to _only_ uncommit A, B and C. Which means this mechanism uncommitted way too many regions.
-------------
Changes requested by tschatzl (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/26240#pullrequestreview-3159175723
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303486703
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303843005
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303644365
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2315588705
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303631395
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303635137
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303737631
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303633302
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303689194
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303677389
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303712006
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303678087
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303678493
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303711237
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303705218
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303742062
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303742530
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303747516
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303755891
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303724705
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303781225
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303736334
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303789544
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303775808
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303790596
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2303798443
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2315512135
PR Review Comment: https://git.openjdk.org/jdk/pull/26240#discussion_r2315522764
More information about the hotspot-gc-dev
mailing list