RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Thomas Schatzl thomas.schatzl at oracle.com
Tue Jan 14 11:36:22 UTC 2020


Hi,

On 14.01.20 10:07, Liang Mao wrote:
> Hi Thomas,
> 
> Thank you for the detailed comments!
> Most of suggestions I will follow to do the modification.
> And I still have some questions:
> 
>>> 1. Does the SoftMaxHeapSize limitation need to consider the GC time 
>>> ratio as in
>>> expand_heap_after_young_collection? Now we haven't put the logic in yet.
> 
>> I am not completely clear what you are asking about, but the gc time 
>> ratio only affects the current "optimal" heap size which is bounded by 
>> SoftMaxHeapsize/max_capacity.
> 
> The decision to shrink to SoftMaxHeapSize in this patch is based
> on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which
> counts "used" + "reserve" + "young". We will change it to
> _heap_sizing_policy->shrink_amount(); as you commented.
> I'm not considering the GC time ratio as a factor to determine
> whether the heap can be shrinked to SoftMaxHeapSize.

Going back to the "spec":

"When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap 
size beyond the specified size, unless the GC decides it's necessary to 
do so. The soft max heap size should not be allowed to be set to a value 
smaller than min heap size (-Xms) or greater than max heap size (-Xmx). 
When not set on the command-line, this flag should default to the max 
heap size." (https://bugs.openjdk.java.net/browse/JDK-8222181)

This is a very loose definition, and "unless the GC decides it's 
necessary" may mean anything.

Looking at ZGC code, it mostly uses it to drive the GCs (and determine 
expansion amount), and let regular expansion/shrinking do the work 
without new rules.

I would at first tend to do the same: if existing heap policy indicates 
that an expansion at young gc (which takes GCTimeRatio into account) is 
needed for whatever reason, I would actually let G1 keep doing it; 
conversely I would also take GCTimeRatio into account when trying to 
shrink to keep the policy symmetric.

The current implementation probably preferentially shrinks towards 
SoftMaxHeapSize, correct? (this shrinking also seems to be limited to 
exactly SoftMaxHeapSize - why not below that if the collector thinks it 
would be okay?)

Do you have measurements with/without GCTimeRatio in the shrinking? Can 
you describe, with over-time heap occupancy graphs that this does not 
work at all in your workloads?

Measurements of committed heap over time of the current solution would 
be appreciated too.

(I haven't had the time yet to set up some useful testing application 
that can be used to simulate phases of such a workload to show heap 
shrinking but I assume you have some data.)

>> - changes in G1CollectedHeap::resize_heap_if_necessary: please make the 
>> method to always use the soft max heap size as I do not understand why 
>> you would not do that.
> 
> Do you think we need to apply the logic "can_shrink_heap_size_to"
> inside resize_heap_if_necessary to determine whether to make soft
> max size as limit?

Resize_heap_if_necessary() is the equivalent of 
adjust_heap_after_young_collection() for after full gc (the naming could 
be unified at some point).

What the change implements right now is to respect SoftMaxHeapSize only 
on an explicit gc in resize_heap_if_necessary(), while always respecting 
it during young gcs.

Do you have a reason for this decision? This seems like an inconsistency 
I can not find a reason for.

As mentioned above, I would try to keep SoftMaxHeapSize only a goal for 
starting (concurrent) garbage collections, with "natural" sizing trying 
to keep the SoftMaxHeapSize goal.

Particularly, if a (compacting) full gc won't meet the SoftMaxHeapSize, 
what else will? It is indeed unfortunate that you might need to tweak 
Min/MaxFreeRatio to achieve higher uncommit ratio at full gc...

This change (system.gc specifically trying to meet SoftMaxHeapSize) also 
seems to be an artifact of your usage of this feature - maybe you happen 
to always issue a system.gc() after you changed SoftMaxHeapSize?
It may probably be better if a concurrent cycle were triggered done 
automatically similar to periodic gcs.

>>  - there is a bug in the synchronization of the concurrent uncommit 
>> thread: it seems possible that the uncommit thread is still working 
>> (iterating over the list of regions to uncommit) while a completed 
>> (young) GC may add new regions to that list as young gcs do not wait for 
>> completion of the uncommit thread.
> 
> Uncommit thread could be working parallelly with VMThread but
> VMThread will not add regions to the concurrent_resizing_list
> if concurrent resizing thread is in "working" state.

Okay, I now see the check in 
G1CollectedHeap::adjust_heap_after_young_collection(), but that also 
prohibits expansion during young GC which seems relatively disruptive. I 
think I mentioned earlier, that in this case (if we want to expand) it 
might be better to wait on completion of the parallel uncommit _if_ the 
other remaining regions are not enough.

(Alternatively one could add another uncommit/commit request to a 
hypothetical uncommit task queue for that resize thread).

My thinking is that at worst, this would result in the same behavior as 
before (i.e. blocking because of commit/uncommit in progress) instead of 
changing the overall behavior by denying expansion requests (silently, 
which is pretty bad). This could probably result in the heap sizing 
policy to get a few samples with high gc time ratio, expanding even more 
when G1 is finally allowed to expand. (Also, fortunately 
UseGCOverheadLimit is not implemented in G1, as that might kill the VM 
spuriously because of that...)

Or do you have some reason for this particular implementation?

(The reason I am asking so much about details is to get a common 
understanding on the solution, i.e. what should happen when, and 
hopefully including why for the next guy ;))

Maybe it would be good to refactor the code in 
G1CollectedHeap::adjust_heap_after_young_collection() (similar to or 
expand in G1CollectedHeap::adjust_heap_after_young_collection()), e.g. 
calculate some heap_size_change() value from the heap sizing policy, and 
then use that value depending on whether it is positive or negative to 
expand or shrink.

> 
>> Related to that is the use of par_set/clear_bit in e.g. the available 
>> bitmap: since all par/clear actions are asserted to be in the vm thread 
>> at a safepoint, there does not seem to be a need for using the parallel 
>> variants of set/clear bit (if keeping the current mechanism).
> 
> For above reason that concurrent uncommit can run parallely with
> VMThread, the bit set/clear invm thread at safepint have
> to be parallel.

If concurrent uncommit is working, both heap shrinking and expansion in 
the safepoint is disabled as far as I can tell. I.e.

2996 void G1CollectedHeap::adjust_heap_after_young_collection() {
2997   if (concurrent_heap_resize()->resize_thread()->during_cycle()) {
2998     // Do not adjust if concurrent resizing is in progress
2999     return;
3000   }
3001
3002   double start_time_ms = os::elapsedTime();
3003   shrink_heap_after_young_collection();
3004   phase_times()->recor...[..]
3005
        // Shrinking might have started resizing
3006   if (!concurrent_heap_resize()->resize_thread()->during_cycle()) {
3007     expand_heap_after_young_collection();
3008   }
3009 }

This method seems to be the only caller to 
expand/shrink_heap_after_young_collection().

Another question I had but forgot is that the thread and in other 
places "resize" is used in the method and class names instead of e.g. 
"uncommit" or "shrink". Do you plan to add concurrent commit too?

>>  - I have a feeling that if the concurrent uncommit thread worked on 
>> pages, not regions, the code would be easier to understand. It would 
>> also solve the issue you asked about with the 
>> G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region 
>> numbers anyway for logging, but otoh the logging could be done 
>> asynchroniusly.
> 
> I don't quite understand this part... For 
> the G1RegionsSmallerThanCommitSizeMapper,
> a page can be simultaneously requested to commit in VMThread to expand
> heap and uncommit in concurrent thread to shrink heap. Looks like lowering
> uncommit work to page level couldn't help this...

I see the problem, but as above I think expansion (commit) and shrinking 
(uncommit) can't happen at the same time at the moment, and it actually 
might be premature to allow expansion and shrinking occur at the same time.

Some more options are serializing these requests; one I described above 
(if a request is pending, wait for its completion) before starting a new 
one. This might be the default option for other reasons anyway.

Another is certainly to simply not uncommit them concurrently then, 
maybe we can still uncommit them immediately?

> For the features you listed below,
>     1) moving the memory uncommit into the concurrent phase
>     2) uncommit at the end of (almost) every GC
>     3) SoftMaxHeapSize
> 
> Since most of code is for the concurrent framework,
> do you think 2) and 3) canbe together and implemented first?
> (The uncommit will happen immediately)
I think that would be fine.


Thanks,
   Thomas



More information about the hotspot-gc-dev mailing list