G1's parallel full GC significantly increases wasted space in Old regions

Mon Feb 19 13:48:42 UTC 2018

Hi,

Thomas summarized this well, just some additional information below.

On 2018-02-17 13:18, Thomas Schatzl wrote:
> Hi,
>
> On Fri, 2018-02-16 at 17:54 -0800, Man Cao wrote:
>> Hi,
>>
>> We (Java platform team at Google) are comparing G1's performance in
>> JDK9u and JDK10. We expect JDK10's G1 to perform better because of
>> JEP 307 (Parallel Full GC).
>> However, we found a performance regression in JDK10 with DaCapo
>> benchmarks. We set the heap size small (about 2-4 times of minimum
>> heap) so they trigger interesting GC activities
> This is really small - most of these benchmarks run fine with heaps in
> the low tens of MB iirc...
>
>> We found JDK10's full GC results in significantly more wasted space
>> in Old regions, which leads to a more fragmented heap and fewer Eden
>> regions. We also found the amount of wasted space after a full GC is
>> proportional to the number of ParallelGCThread. As a result, several
>> benchmarks trigger more Young, Mixed and concurrent collections,
>> leading to increased CPU usage and pause time. One reason that makes
>> these benchmarks sensitive to full GC is that DaCapo harness performs
>> a System.gc() in-between each iteration of the benchmarks. So a more
>> fragmented heap hurts the benchmark from the beginning of every
>> iteration.
>>
>> We are aware this is probably a known issue as described in JEP 307:
>> "Risks and Assumptions: The fact that G1 uses regions will most
>> likely lead to more wasted space after a parallel full GC than for a
>> single threaded one."
>> However, it is not impossible to optimize the full GC to reduce
>> wasted space. After all, a stop-the-world parallel mark-sweep-
> compact
>> algorithm should be able to efficiently compact the heap.
> The problem is that compacting the these "tail regions" needs a
> significant amount of synchronization if you want to do this in
> parallel.
>
> The current (compaction part of the) algorithm is basically serial gc
> on completely distinct sets of regions ("compaction queues"), i.e. does
> no synchronization at all, which makes it fairly fast.
>
> I think there is some mechanism to have one thread at the end
> compacting through the "tail regions" or so, but it is only used in
> specific circumstances. Maybe others want to chime in here. :)
Correct, the only time we do a single threaded run over the "tail/waste 
regions" is if the no regions were freed by the parallel run. Always 
doing this is not as simple as one could imagine. It is of course not 
impossible, but it's a trade-off between code complexity, pause time and 
waste and for the first version the current solution was chosen.
>
> We found that it is most of the time more efficient (and actually
> improves overall throughput) to simply use less threads on small heaps
> instead of having a slow serial phase or add costly synchronization to
> the parallel compaction. This also causes less of this fragmentation,
> and reduces the issue and its effects significantly.
>
> There is no way for Java programs to get a guaranteed "100% maximally
> compacting GC" at this time (unless you run it with a single thread).
> Note that other parallel full gcs (e.g. Parallel GC) have the same
> issue afaik, although it uses a smaller "region size".
>
>> We did not find any RFE or discussion on JBS regarding this. Is there
>> any ongoing effort to reduce wasted space in parallel full GC?
>   see JDK-8194316 [0] which reports the same issues and JDK-8196071 [1]
> for an RFE with a potential fix.
I have tested some fixes for this, but none that is ready for review 
yet. We have had some discussions around how to best scale the number of 
threads and if it should be guarded by -XX:+UseDynamicNumberOfGCThreads. 
Currently the full collection uses the same number of threads as the 
last young collection, so if you run with UseDynamicNumberOfGCThreads 
enabled you will most likely use fewer threads when the heap is small 
and that way get less waste. This is of course not a real solution, but 
it should improve the problems you are seeing.

Thanks,
Stefan

>
> There are btw a few more parallel full gc related issues open, I added
> a "gc-g1-fullgc" label to them right now [4]. This list may not be
> exhaustive, and as usual, when it's done it's done - we welcome
> contributions :)
>
> If you want to work on any of these, it might be useful to start a
> discussion here first to get further help/thoughts.
>
> JDK 11 may also contain more changes to ergonomics to better support
> small heaps, see JDK-8172792 [2]. Note that the JEP does not cover Full
> GC.
>
> (Shameless plug: recent FOSDEM presentation about G1 changes [3])
>
> Thanks,
>    Thomas
>
> [0] https://bugs.openjdk.java.net/browse/JDK-8194316
> [1] https://bugs.openjdk.java.net/browse/JDK-8196071
> [2] https://bugs.openjdk.java.net/browse/JDK-8172792
> [3] https://fosdem.org/2018/schedule/event/g1/
> [4] https://bugs.openjdk.java.net/secure/IssueNavigator.jspa?reset=true
> &jqlQuery=labels+%3D+gc-g1-fullgc
>