Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
maoliang.ml at alibaba-inc.com
Thu Feb 6 12:27:09 UTC 2020
Thanks for the testing and evaluating!
I tried your test with specjbb2015 and had some little different
result maybe because of machine capability. The config I used is as below:
-Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4
The heap was around 6GB after running for a while (300s). And
I was able to use SoftMaxHeapSize to let it shrink to 5GB. It
should be like your scenario to shrink the heap to 3GB.
The behavior is as I expected. But I thought you might expect
more aggressive result. In my mind, for a constant load,
the jvm might not need to shrink the heap that JVM supposes to expand
the heap to the right capacity. The soft limit I imagine is
to bring the heap size down after a load pike. In Alibaba's
workload, the heap shrink is controlled by cluster's unified
control center which has the predicition data and the soft limit
works more like a *hard* limit in our 8u implementation.
So I think it is acceptable that heap size failed shrinked
to 2GB in your test case. You can see that
G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative
and we may be able to make it more aggressive.
For almost idle application which doesn't have a GC for a
rather long time, the shrink cannot happen. In our previous 8u
patch, we have a timer to trigger GC and the softmx is changed by
a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option
in 8u yet). Shall we introduce a timer GC as well?
Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine
the heap expand/shrink in G1 and in our 8u practical experience we never
have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce
your test, the only exception is the heap will expand to 6GB after
shrinking to SoftMaxHeapSize=5g is because in remark we will resize the heap.
BTW, I don't think remark is a good point to resize heap since in remark
phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't
need to resize in remark but just resize after mixed GC according to GCTimeRatio.
Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling
seems a similar approach as ZGC. ZGC is a single generation GC whose scenario
is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision
in G1. Since we already have policy to determine the shrink of the heap
by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according
to SoftMaxHeapSize... We may encounter the situation that we cannot shrink the
heap size to SoftMaxHeapSize but concurrent mark become frequent after affecting
the IHOP policy.
> In the log I have, the problem seems to be that we are re-setting the
> softmaxheapsize within the space reclamation phase (i.e. mixed gc) and
> G1 sizing policies got confused, i.e. it partially keeps on using the 2g
> goal for young gen sizing until the *2 problem expands it. That's a bug
> and needs to be fixed.
I don't think it's a problem that after mixed GC resize_heap_after_young_collection
will evaluate if the heap can be shrinked to the new value of SoftMaxHeapSize.
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 5 (Wed.) 16:14
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
apologies for the late reply - I did look at the patch immediately
after you posted it, but initial tests showed that it does not work as
(I) expected. More about that below. So I went ahead and hack up
something that comes closer to what I had in mind. Unfortunately other
more urgent issues came up, which caused the delay on this work. Sorry.
(And sorry for the long post).
Not having any kind of workload to work with for testing the change I
used some configuration of specjbb2015 with fixed ir  (taken from a
colleague's unrelated recent internal test), simulating a constant load
the user wants to control the heap usage of.
In this situatio I want to apologize to use specjbb2015 for this public
reply because it not openly available, but I only noticed when writing
up this email. Finding a substitute and redoing measurements would
probably take more time. I will start looking into this issue.
Anyway, in my test scenario, after warmup, the user tries to first limit
the heap to 2GB, and after a while to 3GB, and then back to 8GB.
The resulting graph  shows heap metrics over time: blue ("soft") is
the current SoftMaxHeapSize, pink ("committed") represents committed
memory, yellow ("goal") shows G1's current heap size goal, turquoise
("free") the amount of free heap and purple ("used") the amount of used
Ignoring the drop from ~second 30-100 where I finally managed to set
Min/MaxHeapFreeRatio ;) you can see that G1 kind of stabilizes at
around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to
2GB. As you can see, G1 ignores the request. This corresponds to the
code where apparently the heap is only reduced to SoftMaxHeapSize if
there is enough free space to reduce to that value (I think).
At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected
drop in memory usage. However, since the change does not modify G1 goals
it ultimately just ignores the SoftMaxHeapSize goal. It probably worked
if there were no further application activity.
I created a webrev of an alternative attempt that modifies G1's
goal/target heap size in the adaptive IHOP mechanism so that G1
automatically starts marking so that a space reclamation phase starts
before reaching softmaxheapsize. It basically changes the predictor's
reserve according to current committed heap size not only based on
G1ReservePercent, but also on the specified SoftMaxHeapSize.
One complication in a generational setting is to adapt young gen
(particularly survivor size) to that goal too, but I think the change
does okay with that.
However it is not finished yet, there is debugging code in it and one
FIXME that is about shuffling around code properly.
In the graph at  you can see the results, with same metrics shown. In
this case G1 fairly well follows the soft goal.
For the 2g softmaxheapsize goal it works perfectly in the example (*1),
in the 3g softmaxheapsize change we get some initial short overshoot in
committed memory. (*2/*3)
There are however some problems/differences to your solution here which
need to be discussed a bit more to see if it fits you and ultimately
make it perform better:
*0 this change uses existing sizing to uncommit memory, i.e. memory is
not uncommitted immediately but part of regular operation. This means
that the garbage collection cycle needs to advance. In case of specjbb
with fixed IR this is no issue, but completely quiescent applications
need other mechanisms like the "Promptly Return Unused Committed Memory
(JEP 346) feature enabled. Some tuning is needed in that mechanism for
*1 the problem with only setting SoftMaxHeapSize and relying on the
regular uncommit mechanism is that due to other reasons, e.g.
GCTimeRatio, G1 won't achieve this kind of compact heap. This is the
reason why my setup includes the GCTimeRatio=4 on the command line -
otherwise in neither case G1 would achieve the 2g goal (it would settle
around 3g with my changes, didn't test the original changes; max heap
usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify
it during runtime (i.e. when you want to select a different
throughput/latency tradeoff to achieve lower heap usage).
*2 looking at the results more closely the (first) overshoot in the 3g
soft max heap size goal, I think this is a remaining issue in the heap
sizing policy in conjunction with soft max heap size, i.e. temporarily
the target gctimeratio is set to 10% for various reasons. (in
In the log I have, the problem seems to be that we are re-setting the
softmaxheapsize within the space reclamation phase (i.e. mixed gc) and
G1 sizing policies got confused, i.e. it partially keeps on using the 2g
goal for young gen sizing until the *2 problem expands it. That's a bug
and needs to be fixed.
So far previous text only looked at the best case where everything fits
together; there are some other issues which will prevent you from
achieving a tight heap in some cases that I noticed during my testing.
Something to think about.
*4 GCTimeRatio/heap expansion during young gc has different goals than
the (un-)commit at the end of full gc. In some cases, with
SoftMaxHeapSize (but also without), the later will undo the expansion at
young gc, which will immediately start to expand again.
*5 GCTimeRatio can't be adjusted during runtime, which means that you
won't achieve that tight of a heap as in this example. GCTimeRatio is
also a bit unwieldy to use, i.e since it is the denominator in the
(default; nobody sets GCPauseIntervalMillis) time calculation, you get
"good" granularity of low values, but pretty bad granularity of high values.
*6 Min/MaxHeapFreeRatio default values are probably too high - with
adaptive IHOP, G1 can typically meet its current goal very well, any
excess is often just wasted committed memory. A similar issue to that
is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent,
i.e. the default reserve for the IHOP. In this case there will be
significant memory commit/uncommit pauses.
Here is my question to you (and any readers), are you using
Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size
seems to be much more direct and better than Min/MaxHeapFreeRatio. Given
above (and assuming that there are no reasons to keep it), it may be
useful to start deprecation process (at least for the use in G1) when
SoftMaxHeapSize is in.
There are some more issues with heap sizing not really relevant to this
discussion, I need to think about them a bit more and file appropriately
Either way, what do you think about my suggested change? Can you try it
on your workloads to see if it could do the job? Any other comments?
More work is needed on this patch I think; also we might need to think
about how the user can detect this change of the target better in the
logs for troubleshooting.
The original patch (webrev.2) also contained some minor unrelated
cleanups (one constification of a method, one rename of the heap
resizing phase) that might be easier to address separately more quickly ;)
 specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty
VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication
This gives ~1.5GB live set size, on my machine around 10-40ms pause
time, so rather light load at least without setting any heap size goal;
in my runs, G1 settles to around 3.8GB of committed heap. (with
Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into
the VM startup options too)
More information about the hotspot-gc-dev