RFR: 8236926: Concurrently uncommit memory in G1

Albert Mingkun Yang ayang at openjdk.java.net
Thu Nov 12 22:04:02 UTC 2020


On Tue, 10 Nov 2020 10:43:20 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change that implements concurrent uncommit for G1.
> 
> **Summary**
> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks.
> 
> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). 
> 
> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added.
> 
> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. 
> 
> **Logging**
> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`:
> [7,468s][debug][gc,heap        ] GC(32) Regions ready for uncommit: 1873
> ...
> [7,509s][trace][gc,heap        ] Concurrent Uncommit: 256M, 32 regions, 11,173ms
> [7,522s][trace][gc,heap        ] Concurrent Uncommit: 256M, 32 regions, 12,599ms
> ...
> [9,691s][debug][gc,heap        ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms
> 
> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated.
> 
> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states:
> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024)
> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280)
> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536)
> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792)
> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651)
> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96)
> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099)
> 
> **Testing**
> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot.
> 
> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes.

The PR description provides a very good summary. Furthermore, it would be nice to have an overview of the algorithm: how the additional concurrency managed, what resources are protected by the lock, etc. Ideally, this should be in the comments, making references to corresponding classes/variables.

> I've also done a performance run and as expected there are not significant changes.

How about pauses? One of the motivations of this PR is to remove pauses introduced by synchronous uncommit, right?

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2668:

> 2666: void G1CollectedHeap::uncommit_heap_if_necessary() {
> 2667:   if (hrm()->has_inactive_regions()) {
> 2668:     G1UncommitRegionTask::activate();

Since `activate` is already used for regions, I would suggest another word; simple `run` is enough. Additionally, I don't think using an enum is necessary, a plain `bool is_running` should do.

enum class TaskState { active, inactive };
TaskState _state;

src/hotspot/share/gc/g1/heapRegionManager.cpp line 178:

> 176: 
> 177: void HeapRegionManager::expand(uint start, uint num_regions, WorkGang* pretouch_gang) {
> 178:   guarantee(num_regions > 0, "No point in calling this for zero regions");

The same `guarantee` is already in `commit_regions`; I think an assert is fine.

src/hotspot/share/gc/g1/heapRegionManager.cpp line 189:

> 187:     }
> 188: 
> 189:     if (G1CollectedHeap::heap()->hr_printer()->is_active()) {

This `if` is not needed, right? `commit(hr)` does such check already. There are a few other cases of the same kind.

src/hotspot/share/gc/g1/heapRegionManager.cpp line 272:

> 270: void HeapRegionManager::deactivate_regions(uint start, size_t num_regions) {
> 271:   guarantee(num_regions >= 1, "Need to specify at least one region to uncommit, tried to uncommit zero regions at %u", start);
> 272:   guarantee(length() >= num_regions, "pre-condition");

I am not really sure why `guarantee` here but `assert` in `reactivate_regions`. I don't see a strong reason to use `guarantee` here.

src/hotspot/share/gc/g1/heapRegionManager.cpp line 264:

> 262:   assert(num_regions > 0, "No point in calling this for zero regions");
> 263: 
> 264:   clear_auxiliary_data_structures(start, num_regions);

IMO, this name is too general. Even after reading its body and the associated comments, I don't get what "data structures" are cleared.

src/hotspot/share/gc/g1/heapRegionManager.cpp line 330:

> 328:     // No more regions available for uncommit
> 329:     if (range.length() == 0) {
> 330:       return uncommitted;

Is `assert(uncommitted > 0)` true here? Why or why not? Please add some comments; I think it's helpful for the readers.

src/hotspot/share/gc/g1/heapRegionManager.hpp line 97:

> 95: 
> 96:   // Notify other data structures about change in the heap layout.
> 97:   void update_committed_space(HeapWord* old_end, HeapWord* new_end);

`update_committed_space` is not implemented, right?

src/hotspot/share/gc/g1/heapRegionManager.hpp line 81:

> 79:   G1RegionToSpaceMapper* _card_counts_mapper;
> 80: 
> 81:   // Map to keep track of which regions are in use.

The name of the variable suggests it's tracking what regions are committed and that's all. After reading `G1CommittedRegionMap`, I believe the class name and the var name are quite misleading; it tracks the state of mem backing up all regions, `committed+mapped`, `committed`, `to_be_uncommitted`, `committed`. I don't have any good alternatives, but the comments could surely be expanded.

src/hotspot/share/gc/g1/heapRegionManager.hpp line 137:

> 135:   void deactivate_regions(uint start, size_t num_regions);
> 136:   void reactivate_regions(uint start, uint num_regions);
> 137:   void uncommit_regions(uint start, uint num_regions);

Why `uint` vs `size_t`? Why `= 1` for some but not others? If such inconsistency is intentional, it should be documented.

src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 107:

> 105: 
> 106:   // Each execution is limited to uncommit at most 256M worth of regions.
> 107:   static const uint region_limit = (uint) (256 * M / G1HeapRegionSize);

Why 256M? Better include some motivation in the comments.

src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 67:

> 65:   void active_clear_range(uint start, uint end);
> 66:   void inactive_set_range(uint start, uint end);
> 67:   void inactive_clear_range(uint start, uint end);

After reading their implementation, I see that `Uncommit_lock` must be held on call them. I think it's best to mention this precondition in the comments, and explain why this lock is needed.

-------------

Changes requested by ayang (Author).

PR: https://git.openjdk.java.net/jdk/pull/1141


More information about the hotspot-dev mailing list