Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Liang Mao
maoliang.ml at alibaba-inc.com
Wed Jan 15 03:52:02 UTC 2020
Hi Thomas,
I summarize the issues in as following:
1. Criterion of SoftMaxHeapSize
I agree to keep the policy of SoftMaxHeapSize similar with ZGC to make
it unified. So "expand_heap_after_young_collection" is used for meeting the
basic GCTimeRatio and expand heap immediately which cannot be blocked by
any reasons. "adjust_heap_after_young_collection" cannot change the logic
and I will take both expansion and shrink into consideration. Is my
understanding correct here?
2. Full GC with SoftMaxHeapSize
In my thought non-explicit Full GC probably means the insufficiency of heap
capacity and we may not keep shrinking within SoftMaxHeapSize but explicit
FGC don't have that issue. That's the only reason why I checked if it is
explicit. But we will have the same determine logic to check if the heap
can be shrinked so "explicit" check could be meaningless and I will remove that.
3. SoftMaxHeapSizeConstraintFunc doesn't check Xms
The constraint function didn't make sure the SoftMaxHeapSize should less
than Xms. Do we need to add the checking? It will not only affect G1...
4. commit/uncommit parallelism
The concurrent uncommit will work with VMThread doing GC and GC may request
to expand heap if not enough empty regions. So the parallelism is possible and
immediate uncommit is a solution.
4. More heap expansion/shrink heuristics further
We have some data and experience in dynamimc heap adjustment in our workloads.
The default GCTimeRatio 12 is really well tuned number that we found applications
will have obvious timeout erros if it is less than ~12. So it is kind of *hard*
limit and we need to expand immediately if GCTimeRatio drops below 12. The
difference in our workloads is that we will keep a GCTimeRatio nearly the original
value 99 to make GC in a heathy state because allocation rate and outside input
can vary violently that we don't want frequent adjustment. You know that in our 8u
implementation we just keep a conservative GC interval to achieve that. Comparing to the
current code in JDK15, keeping GCTimeRatio as 99 is a different behavior which might
have more memory footprint. I propose if we can still use the original option
"-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified number.
The default flow will make sure the GCTimeRatio is above the threshold 12 and concurrent
commit/uncommit will adjust the heap to keep GCTimeRatio in a proper number that
the adjustment is not urgent.
Thanks,
Liang
------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Jan. 14 (Tue.) 19:36
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Hi,
On 14.01.20 10:07, Liang Mao wrote:
> Hi Thomas,
>
> Thank you for the detailed comments!
> Most of suggestions I will follow to do the modification.
> And I still have some questions:
>
>>> 1. Does the SoftMaxHeapSize limitation need to consider the GC time
>>> ratio as in
>>> expand_heap_after_young_collection? Now we haven't put the logic in yet.
>
>> I am not completely clear what you are asking about, but the gc time
>> ratio only affects the current "optimal" heap size which is bounded by
>> SoftMaxHeapsize/max_capacity.
>
> The decision to shrink to SoftMaxHeapSize in this patch is based
> on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which
> counts "used" + "reserve" + "young". We will change it to
> _heap_sizing_policy->shrink_amount(); as you commented.
> I'm not considering the GC time ratio as a factor to determine
> whether the heap can be shrinked to SoftMaxHeapSize.
Going back to the "spec":
"When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap
size beyond the specified size, unless the GC decides it's necessary to
do so. The soft max heap size should not be allowed to be set to a value
smaller than min heap size (-Xms) or greater than max heap size (-Xmx).
When not set on the command-line, this flag should default to the max
heap size." (https://bugs.openjdk.java.net/browse/JDK-8222181)
This is a very loose definition, and "unless the GC decides it's
necessary" may mean anything.
Looking at ZGC code, it mostly uses it to drive the GCs (and determine
expansion amount), and let regular expansion/shrinking do the work
without new rules.
I would at first tend to do the same: if existing heap policy indicates
that an expansion at young gc (which takes GCTimeRatio into account) is
needed for whatever reason, I would actually let G1 keep doing it;
conversely I would also take GCTimeRatio into account when trying to
shrink to keep the policy symmetric.
The current implementation probably preferentially shrinks towards
SoftMaxHeapSize, correct? (this shrinking also seems to be limited to
exactly SoftMaxHeapSize - why not below that if the collector thinks it
would be okay?)
Do you have measurements with/without GCTimeRatio in the shrinking? Can
you describe, with over-time heap occupancy graphs that this does not
work at all in your workloads?
Measurements of committed heap over time of the current solution would
be appreciated too.
(I haven't had the time yet to set up some useful testing application
that can be used to simulate phases of such a workload to show heap
shrinking but I assume you have some data.)
>> - changes in G1CollectedHeap::resize_heap_if_necessary: please make the
>> method to always use the soft max heap size as I do not understand why
>> you would not do that.
>
> Do you think we need to apply the logic "can_shrink_heap_size_to"
> inside resize_heap_if_necessary to determine whether to make soft
> max size as limit?
Resize_heap_if_necessary() is the equivalent of
adjust_heap_after_young_collection() for after full gc (the naming could
be unified at some point).
What the change implements right now is to respect SoftMaxHeapSize only
on an explicit gc in resize_heap_if_necessary(), while always respecting
it during young gcs.
Do you have a reason for this decision? This seems like an inconsistency
I can not find a reason for.
As mentioned above, I would try to keep SoftMaxHeapSize only a goal for
starting (concurrent) garbage collections, with "natural" sizing trying
to keep the SoftMaxHeapSize goal.
Particularly, if a (compacting) full gc won't meet the SoftMaxHeapSize,
what else will? It is indeed unfortunate that you might need to tweak
Min/MaxFreeRatio to achieve higher uncommit ratio at full gc...
This change (system.gc specifically trying to meet SoftMaxHeapSize) also
seems to be an artifact of your usage of this feature - maybe you happen
to always issue a system.gc() after you changed SoftMaxHeapSize?
It may probably be better if a concurrent cycle were triggered done
automatically similar to periodic gcs.
>> - there is a bug in the synchronization of the concurrent uncommit
>> thread: it seems possible that the uncommit thread is still working
>> (iterating over the list of regions to uncommit) while a completed
>> (young) GC may add new regions to that list as young gcs do not wait for
>> completion of the uncommit thread.
>
> Uncommit thread could be working parallelly with VMThread but
> VMThread will not add regions to the concurrent_resizing_list
> if concurrent resizing thread is in "working" state.
Okay, I now see the check in
G1CollectedHeap::adjust_heap_after_young_collection(), but that also
prohibits expansion during young GC which seems relatively disruptive. I
think I mentioned earlier, that in this case (if we want to expand) it
might be better to wait on completion of the parallel uncommit _if_ the
other remaining regions are not enough.
(Alternatively one could add another uncommit/commit request to a
hypothetical uncommit task queue for that resize thread).
My thinking is that at worst, this would result in the same behavior as
before (i.e. blocking because of commit/uncommit in progress) instead of
changing the overall behavior by denying expansion requests (silently,
which is pretty bad). This could probably result in the heap sizing
policy to get a few samples with high gc time ratio, expanding even more
when G1 is finally allowed to expand. (Also, fortunately
UseGCOverheadLimit is not implemented in G1, as that might kill the VM
spuriously because of that...)
Or do you have some reason for this particular implementation?
(The reason I am asking so much about details is to get a common
understanding on the solution, i.e. what should happen when, and
hopefully including why for the next guy ;))
Maybe it would be good to refactor the code in
G1CollectedHeap::adjust_heap_after_young_collection() (similar to or
expand in G1CollectedHeap::adjust_heap_after_young_collection()), e.g.
calculate some heap_size_change() value from the heap sizing policy, and
then use that value depending on whether it is positive or negative to
expand or shrink.
>
>> Related to that is the use of par_set/clear_bit in e.g. the available
>> bitmap: since all par/clear actions are asserted to be in the vm thread
>> at a safepoint, there does not seem to be a need for using the parallel
>> variants of set/clear bit (if keeping the current mechanism).
>
> For above reason that concurrent uncommit can run parallely with
> VMThread, the bit set/clear invm thread at safepint have
> to be parallel.
If concurrent uncommit is working, both heap shrinking and expansion in
the safepoint is disabled as far as I can tell. I.e.
2996 void G1CollectedHeap::adjust_heap_after_young_collection() {
2997 if (concurrent_heap_resize()->resize_thread()->during_cycle()) {
2998 // Do not adjust if concurrent resizing is in progress
2999 return;
3000 }
3001
3002 double start_time_ms = os::elapsedTime();
3003 shrink_heap_after_young_collection();
3004 phase_times()->recor...[..]
3005
// Shrinking might have started resizing
3006 if (!concurrent_heap_resize()->resize_thread()->during_cycle()) {
3007 expand_heap_after_young_collection();
3008 }
3009 }
This method seems to be the only caller to
expand/shrink_heap_after_young_collection().
Another question I had but forgot is that the thread and in other
places "resize" is used in the method and class names instead of e.g.
"uncommit" or "shrink". Do you plan to add concurrent commit too?
>> - I have a feeling that if the concurrent uncommit thread worked on
>> pages, not regions, the code would be easier to understand. It would
>> also solve the issue you asked about with the
>> G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region
>> numbers anyway for logging, but otoh the logging could be done
>> asynchroniusly.
>
> I don't quite understand this part... For
> the G1RegionsSmallerThanCommitSizeMapper,
> a page can be simultaneously requested to commit in VMThread to expand
> heap and uncommit in concurrent thread to shrink heap. Looks like lowering
> uncommit work to page level couldn't help this...
I see the problem, but as above I think expansion (commit) and shrinking
(uncommit) can't happen at the same time at the moment, and it actually
might be premature to allow expansion and shrinking occur at the same time.
Some more options are serializing these requests; one I described above
(if a request is pending, wait for its completion) before starting a new
one. This might be the default option for other reasons anyway.
Another is certainly to simply not uncommit them concurrently then,
maybe we can still uncommit them immediately?
> For the features you listed below,
> 1) moving the memory uncommit into the concurrent phase
> 2) uncommit at the end of (almost) every GC
> 3) SoftMaxHeapSize
>
> Since most of code is for the concurrent framework,
> do you think 2) and 3) canbe together and implemented first?
> (The uncommit will happen immediately)
I think that would be fine.
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list