RFR: 8236926: Concurrently uncommit memory in G1 [v7]
Thomas Schatzl
tschatzl at openjdk.java.net
Wed Nov 18 10:46:12 UTC 2020
On Mon, 16 Nov 2020 19:39:13 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
>> Please review this change that implements concurrent uncommit for G1.
>>
>> **Summary**
>> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks.
>>
>> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS).
>>
>> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added.
>>
>> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization.
>>
>> **Logging**
>> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`:
>> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873
>> ...
>> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms
>> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms
>> ...
>> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms
>>
>> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated.
>>
>> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states:
>> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024)
>> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280)
>> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536)
>> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792)
>> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651)
>> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96)
>> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099)
>>
>> **Testing**
>> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot.
>>
>> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes.
>
> Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:
>
> - Merge branch 'master' into 8236926-ccu
> - Zoom feedback
> - Albert review 2
> - Albert review
> - Merge branch 'master' into 8236926-ccu
> - Lock for small mapper and use BitMap parallel operations.
> - Self review
> - Simplified task
> - Improved logging
> - Test improvement
> - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8
Changes requested by tschatzl (Reviewer).
src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 98:
> 96: // Finds the next range of committable regions starting at offset.
> 97: // This function must only be called when no inactive regions are
> 98: // present and can be used to active more regions.
s/active/activate
src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 567:
> 565:
> 566: // Check if there is memory to uncommit and if so schedule a task to do it.
> 567: void uncommit_heap_if_necessary();
I would prefer if the method were called `uncommit_regions_if_necessary()` as this method does not uncommit the *heap* but just uncomittable regions.
src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 568:
> 566: // Check if there is memory to uncommit and if so schedule a task to do it.
> 567: void uncommit_heap_if_necessary();
> 568: uint uncommit_regions(uint region_limit);
Please add a comment like "// Immediately uncommits uncommittable regions.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 806:
> 804: HeapRegion::GrainWords * HeapWordSize * shrink_count);
> 805: // Explicit uncommit.
> 806: _hrm.uncommit_inactive_regions((uint) shrink_count);
Please let the code `G1CollectedHeap::uncommit_regions()` helper here to limit the references to `_hrm`.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 747:
> 745: HeapRegion* prev_last_region = NULL;
> 746: size_t size_used = 0;
> 747: size_t shrink_count = 0;
The code may as well define `shrink_count` as `uint` as the only use seems to cast it to `uint` anyway.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1312:
> 1310: shrink_bytes, aligned_shrink_bytes, shrunk_bytes);
> 1311: if (num_regions_removed > 0) {
> 1312: log_debug(gc, heap)("Regions ready for uncommit: %u", num_regions_removed);
Maybe keep with existing terminology here i.e. `Uncommittable regions after shrink: %u`
src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 58:
> 56:
> 57: public:
> 58: static void run();
I'd prefer if the method were named `schedule` (or something like `schedule_for_later` or maybe `enqueue` to not clash with the protected `schedule(jlong)` method) instead of `run` since imho `run` implies that it is actually executed.
src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 128:
> 126: // No delay, reason to reschedule rather then to loop is to allow
> 127: // other tasks to run without waiting for a full uncommit cycle.
> 128: schedule(0);
Why not notify like in `run()`? Maybe refactor the code a bit to allow calling the same method to schedule the task and notify the service thread for immediate processing here.
src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 42:
> 40: // while running on the service thread joined with the suspendible
> 41: // thread set.
> 42: bool _active;
The information in the comment is missing context on how scheduling and suspension works and how the `_active` flag interacts with `run()` requests. The MT safety seems an implementation detail. I.e. it would be nice to know that:
* this task does its work in chunks, i.e. does not uncommit everything at once to avoid long stalls or issues with GCs interrupting it
* this flag prevents rescheduling the task when it has not uncommitted everything yet but another request (`run` call) comes in.
src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 83:
> 81: _summary_duration += time;
> 82:
> 83: log_trace(gc, heap)("Concurrent Uncommit: " SIZE_FORMAT "%s, %u regions, %1.3fms",
Maybe use gc+heap+region here (and below) like other code; imho after all this task is kind of an extension of the `HeapRegionManager`. I did not look at the output though.
src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1172:
> 1170: reset_at_marking_complete();
> 1171:
> 1172: _g1h->uncommit_heap_if_necessary();
I'd prefer it this call were placed below the `resize_heap_if_necessary` call unless there is a reason not to.
src/hotspot/share/gc/g1/g1FullCollector.cpp line 218:
> 216: _heap->print_heap_after_full_collection(scope()->heap_transition());
> 217:
> 218: _heap->uncommit_heap_if_necessary();
Maybe move this to the `gc_epilogue` or actually into `prepare_heap_for_mutators` where the shrinking occurs.
src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 81:
> 79: }
> 80:
> 81: bool committed_range(uint start_idx, size_t num_regions) {
I would prefer if these two methods were named more like a condition, i.e. `is_range_committed` and `is_range_uncommitted` (optionally exchanging `range` and `[un]committed`) as this makes the verification they are used in read better.
src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 92:
> 90:
> 91: virtual void commit_regions(uint start_idx, size_t num_regions, WorkGang* pretouch_gang) {
> 92: guarantee(uncommitted_range(start_idx, num_regions),
Not sure this (and the one in `uncommit_regions`) should be guarantees.
src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 138:
> 136: // - G1RegionToSpaceMapper::_region_commit_map;
> 137: // - G1PageBasedVirtualSpace::_committed (_storage.commit())
> 138: Mutex _lock;
Good comment! :)
src/hotspot/share/gc/g1/g1ServiceThread.hpp line 138:
> 136: // Notify a change to the service thread. Used to stop either
> 137: // stop the service or to force check for new tasks.
> 138: void notify();
The first "stop" is too much
src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 46:
> 44: };
> 45:
> 46: class G1CommittedRegionMap : public CHeapObj<mtGC> {
It would be nice to have a diagram of the region states and their transitions here.
src/hotspot/share/gc/g1/heapRegionManager.cpp line 181:
> 179: G1CollectedHeap::heap()->hr_printer()->commit(hr);
> 180: }
> 181: activate_regions(start, num_regions);
In this place there will be two messages from the HRPrinter for every region:
1) a COMMIT message and
2) an ACTIVATE message
This is a bit confusing as in my understanding (that's why I asked for a region state diagram in the `G1CommittedRegionMap` which may as well be put in `HeapRegionManager`) the (typical) flow of states are Uncommitted->Committed/Active->Committed/Inactive->Uncommitted.
As mentioned, I'm not sure if it is a good idea to send two separate messages here; better rename the "Active" message to "Commit-Active" (and "Inactive" to "Commit-Inactive") instead imho, even if it's quite long (and drop `HRPrinter::commit()` completely)
-------------
PR: https://git.openjdk.java.net/jdk/pull/1141
More information about the hotspot-dev
mailing list