RFR: JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio [v8]

Thomas Schatzl tschatzl at openjdk.java.net
Thu Mar 11 15:40:14 UTC 2021


On Wed, 10 Mar 2021 11:53:28 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Summary
>> -----------
>> 
>> Improve G1 Full GC by skip compaction for regions with high survival ratio.
>> 
>> Backgroud
>> -----------
>> 
>> There are 4 steps in full gc of G1 GC.
>> - mark live objects
>> - prepare forwardee
>> - adjust pointers
>> - compact
>> 
>> When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy.
>> 
>> Description
>> -----------
>> 
>> We enhance the full gc implementation for the above situation through following steps:
>> - accumulate live bytes of every hr in mark phase;
>> - add hr's with high percentage of live bytes into a "no moving" set rather the normal compaction set in prepare phase, and fill dummy objects in the places of dead objects in these hr's;
>> - nothing special is done in adjust phase;
>> - just compact the regions in compaction set;
>> 
>> VM options added
>> -----------
>> 
>> - G1SkipCompactionLiveBytesLowerThreshold: The lower threshold of heap region live bytes percent in G1 full GC
>> 
>> Test
>> -----------
>> 
>> - specjbb2015: no regression
>> - dacapo: 3%-11% improvement of full gc pause. Attachment is the dacapo h2 full gc pause.
>> 
>> $ java -Xmx1g -Xms1g -XX:G1SkipCompactionLiveBytesLowerThreshold=98 -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix bot crash.

The change should support `MarkSweepAlwaysCompactCount` (just adapt the threshold).

The "last ditch" collection should also fully compact, as well as probably a System.gc() call. I do not know the exact rules for the other collectors right now.

Please update the summary with these (and the earlier mentioned) requirements and approach.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp line 50:

> 48:     // cache size is big enough that not increase pause during marking
> 49:     // by avoiding hit misses as so as possible
> 50:     _mark_region_cache(mark_stats, round_up_power_of_2(G1CollectedHeap::heap()->max_regions())) {

The suggested cache size is way too huge for large heaps (10k's of regions) for no gain. Unless you have good numbers, please use the same size as for the concurrent mark stats cache (fixed 1024).
Feel free to move `G1CMTask::RegionMarkStatsCacheSize` to `G1RegionMarkStatsCache` and make it a public static constant there.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 66:

> 64:   CLDToOopClosure      _cld_closure;
> 65: 
> 66:   G1RegionMarkStatsCache _mark_region_cache;

Could you call this instance `_mark_stats_cache` too like in the other use, and adapt the various method names? `_mark_region_cache` does not indicate that this has something to do with (liveness) "statistics" cache. I do not know what a "region cache" is.

src/hotspot/share/gc/g1/g1FullCollector.hpp line 72:

> 70: 
> 71:   GrowableArray<HeapRegion*>**   _skipping_compaction_sets;
> 72:   G1RegionMarkStats*             _live_stats;

I think `_live_words` like the other instance is a better name.

src/hotspot/share/gc/g1/g1FullCollector.hpp line 95:

> 93:   GrowableArray<HeapRegion*>* skipping_compaction_set(uint id) { return _skipping_compaction_sets[id]; }
> 94:   size_t live_bytes_after_full_gc_mark(uint region_idx) {
> 95:     return MarkSweepDeadRatio > 0 ? _live_stats[region_idx]._live_words * HeapWordSize : 0;

Not sure if the additional `MarkSweepDeadRatio > 0` check is necessary or useful at all. In case `MarkSweepDeadRatio` is zero, `_hr_live_bytes_threshold` is larger than whatever is stored in here anyway.

Also, please drop the `* HeapWordSize` and compare to a `_hr_live_word_threshold`. This multiplication is completely unnecessary from what I can tell. (It does not really hurt either, but why would I want to multiply both sides of your comparison by the same value before the comparison?)

Also just `liveness()` (or name this and the equivalent in `G1ConcurrentMark` something like `live_words`) is sufficient.

src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp line 102:

> 100:   }
> 101: 
> 102:   if (MarkSweepDeadRatio > 0) {

This check is useless. If `MarkSweepDeadRatio == 0`, the `skipping_compaction_queue` is empty anyway. You could certainly assert that if `MarkSweepDeadRatio == 0`, then `skipping_compaction_queue` must be empty too.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp line 51:

> 49:     // by avoiding hit misses as so as possible
> 50:     _mark_region_cache(mark_stats, round_up_power_of_2(G1CollectedHeap::heap()->max_regions())) {
> 51:   if (MarkSweepDeadRatio > 0) {

Please drop this check and just unconditionally initialize the data. If the patch didn't make it too huge (like suggested), this is a drop in the bucket.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 104:

> 102: 
> 103:   void flush_mark_region_cache() {
> 104:     if (MarkSweepDeadRatio > 0) {

Drop this check to make the code more straightforward. Other code like in PR #2579 might find this information useful too. Maybe this could even be factored out in a separate CR.

src/hotspot/share/gc/g1/g1FullGCMarker.inline.hpp line 70:

> 68:   // Collect live bytes, which is used to tell
> 69:   // whether to skip high live bytes heap regions.
> 70:   if (MarkSweepDeadRatio > 0) {

Same here.

src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 66:

> 64:     assert(!hr->is_humongous(), "humongous objects not supported.");
> 65:     size_t live_bytes = _collector->live_bytes_after_full_gc_mark(hr->hrm_index());
> 66:     if(live_bytes <= _hr_live_bytes_threshold) {

Missing space after the `if`.

src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 251:

> 249:   assert(next_addr == limit, "Should stop the scan at the limit.");
> 250: }
> 251: 

I think all this code is not required/duplicate of the existing code that handles pinned regions (added in https://bugs.openjdk.java.net/browse/JDK-8253600). Please have a look and see if that code can be reused/repurposed.

I mean, these "skipped regions" should be equivalent to "pinned" regions wrt to required object handling. I admit I haven't actually looked how to do this, but this seems awfully similar with the only difference that skipped regions are determined only after marking and pinned regions sometimes before (full) gc.

The other handling should be the same. So *at worst* temporarily marking these regions as "pinned" should be sufficient for this functionality to be able to reuse the mechanism.

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2760



More information about the hotspot-gc-dev mailing list