RFR: JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio

Fri Mar 5 02:59:38 UTC 2021

On Wed, 3 Mar 2021 08:06:37 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi Hamlin,
>> 
>> First of all, thanks for contributing. 
>> 
>> Before doing a more in depth review of this change I have a few questions/suggestions:
>> 1. This feature if quite similar to the "dead wood" feature already present for Serial/Parallel, and I agree with what's written in the JBS issue about trying to reuse the already present options. This way the feature would be enabled by default as well (since `MarkSweepDeadRatio` has a default value of 5) and I think that makes sense to make sure it gets proper testing. We can of course do something similar to what Parallel does, and tweak the value for G1 but I think it should be on by default. 
>> 2. G1 does liveness accounting during concurrent mark as well and there is already code doing more or less the same thing as the new `G1FullGCMarkRegionCache` class. Have you looked at `G1RegionMarkStatsCache` and is there any reason we can't reuse this?
>> 3. Benchmarking the performance of the Full GC is often a bit problematic, because G1 actively tries to avoid Full GCs. When I implemented the parallel FullGC I wrote some small benchmarks. Simple uses-cases that I wanted to make sure the FullGC handled well. I dug those up and took them for a spin with your changes. The results are more or less as expected. For the use-cases with "full" regions we see a clear improvement and for almost all the other use-cases the results are in line with the baseline. But this one test showed a quite significant regression in the marking times. The test looks like this: 
>> public class SystemGCLargeArray {
>>   public static Object[] holder;
>>   public static void main(String args[]) {
>>     holder = new Object[128 * 1024 *1024];
>>     System.gc();
>>   }
>> }
>> So we have a large array that the workers will scan for reference to other objects, the results on my workstation look like this:
>> Baseline: GC(0) Phase 1: Mark live objects 59,039ms
>> 8262068:  GC(0) Phase 1: Mark live objects 136,231ms (G1SkipCompactionLiveBytesLowerThreshold=100)
>> 8262068:  GC(0) Phase 1: Mark live objects 137,956ms (G1SkipCompactionLiveBytesLowerThreshold=95)
>> The results are quite surprising, because in the other benchmarks there is not a big diff in marking times. But looking a bit at the code and how the inlining decisions are made for `G1FullGCMarker::mark_and_push(T* p)` it looks like the function grows a bit to much which prevents some important inlining. To avoid this I moved the accounting to the end of `mark_object()` instead and this removed more or less the whole regression. Getting the compiler to inline exactly the right things is hard, but this is something to keep in mind when getting strange regressions in the GC. 
>> 
>> So before looking closer at the code I would like to see if we can reuse `G1RegionMarkStatsCache` and change to use the existing flag. What do you think about that?
>> 
>> Thanks,
>> Stefan
>
> Hi Stefan , Thanks a lot for detailed review and benchmark!
> Sure, I will update the patch as you suggested later.

Hi Stefan,

There is jdk/tier1 test failure on MaxOS, it passed on other platforms: https://github.com/openjdk/jdk/pull/2760/checks?check_run_id=2033359984. 
Rerun passed: https://github.com/Hamlin-Li/jdk/actions/runs/622899539

-------------

PR: https://git.openjdk.java.net/jdk/pull/2760