RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Feb 5 08:13:57 UTC 2020
Hi Liang,
apologies for the late reply - I did look at the patch immediately
after you posted it, but initial tests showed that it does not work as
(I) expected. More about that below. So I went ahead and hack up
something that comes closer to what I had in mind. Unfortunately other
more urgent issues came up, which caused the delay on this work. Sorry.
(And sorry for the long post).
Not having any kind of workload to work with for testing the change I
used some configuration of specjbb2015 with fixed ir [0] (taken from a
colleague's unrelated recent internal test), simulating a constant load
the user wants to control the heap usage of.
In this situatio I want to apologize to use specjbb2015 for this public
reply because it not openly available, but I only noticed when writing
up this email. Finding a substitute and redoing measurements would
probably take more time. I will start looking into this issue.
Anyway, in my test scenario, after warmup, the user tries to first limit
the heap to 2GB, and after a while to 3GB, and then back to 8GB.
The resulting graph [1] shows heap metrics over time: blue ("soft") is
the current SoftMaxHeapSize, pink ("committed") represents committed
memory, yellow ("goal") shows G1's current heap size goal, turquoise
("free") the amount of free heap and purple ("used") the amount of used
memory.
Ignoring the drop from ~second 30-100 where I finally managed to set
Min/MaxHeapFreeRatio ;) you can see that G1 kind of stabilizes at
around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to
2GB. As you can see, G1 ignores the request. This corresponds to the
code where apparently the heap is only reduced to SoftMaxHeapSize if
there is enough free space to reduce to that value (I think).
At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected
drop in memory usage. However, since the change does not modify G1 goals
it ultimately just ignores the SoftMaxHeapSize goal. It probably worked
if there were no further application activity.
I created a webrev of an alternative attempt that modifies G1's
goal/target heap size in the adaptive IHOP mechanism so that G1
automatically starts marking so that a space reclamation phase starts
before reaching softmaxheapsize. It basically changes the predictor's
reserve according to current committed heap size not only based on
G1ReservePercent, but also on the specified SoftMaxHeapSize.
One complication in a generational setting is to adapt young gen
(particularly survivor size) to that goal too, but I think the change
does okay with that.
However it is not finished yet, there is debugging code in it and one
FIXME that is about shuffling around code properly.
In the graph at [3] you can see the results, with same metrics shown. In
this case G1 fairly well follows the soft goal.
For the 2g softmaxheapsize goal it works perfectly in the example (*1),
in the 3g softmaxheapsize change we get some initial short overshoot in
committed memory. (*2/*3)
There are however some problems/differences to your solution here which
need to be discussed a bit more to see if it fits you and ultimately
make it perform better:
*0 this change uses existing sizing to uncommit memory, i.e. memory is
not uncommitted immediately but part of regular operation. This means
that the garbage collection cycle needs to advance. In case of specjbb
with fixed IR this is no issue, but completely quiescent applications
need other mechanisms like the "Promptly Return Unused Committed Memory
(JEP 346) feature enabled. Some tuning is needed in that mechanism for
almost-idle applications.
*1 the problem with only setting SoftMaxHeapSize and relying on the
regular uncommit mechanism is that due to other reasons, e.g.
GCTimeRatio, G1 won't achieve this kind of compact heap. This is the
reason why my setup includes the GCTimeRatio=4 on the command line -
otherwise in neither case G1 would achieve the 2g goal (it would settle
around 3g with my changes, didn't test the original changes; max heap
usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify
it during runtime (i.e. when you want to select a different
throughput/latency tradeoff to achieve lower heap usage).
*2 looking at the results more closely the (first) overshoot in the 3g
soft max heap size goal, I think this is a remaining issue in the heap
sizing policy in conjunction with soft max heap size, i.e. temporarily
the target gctimeratio is set to 10% for various reasons. (in
G1HeapSizingPolicy::expansion_amount()).
In the log I have, the problem seems to be that we are re-setting the
softmaxheapsize within the space reclamation phase (i.e. mixed gc) and
G1 sizing policies got confused, i.e. it partially keeps on using the 2g
goal for young gen sizing until the *2 problem expands it. That's a bug
and needs to be fixed.
So far previous text only looked at the best case where everything fits
together; there are some other issues which will prevent you from
achieving a tight heap in some cases that I noticed during my testing.
Something to think about.
*4 GCTimeRatio/heap expansion during young gc has different goals than
the (un-)commit at the end of full gc. In some cases, with
SoftMaxHeapSize (but also without), the later will undo the expansion at
young gc, which will immediately start to expand again.
*5 GCTimeRatio can't be adjusted during runtime, which means that you
won't achieve that tight of a heap as in this example. GCTimeRatio is
also a bit unwieldy to use, i.e since it is the denominator in the
(default; nobody sets GCPauseIntervalMillis) time calculation, you get
"good" granularity of low values, but pretty bad granularity of high values.
*6 Min/MaxHeapFreeRatio default values are probably too high - with
adaptive IHOP, G1 can typically meet its current goal very well, any
excess is often just wasted committed memory. A similar issue to that
is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent,
i.e. the default reserve for the IHOP. In this case there will be
significant memory commit/uncommit pauses.
Here is my question to you (and any readers), are you using
Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size
seems to be much more direct and better than Min/MaxHeapFreeRatio. Given
above (and assuming that there are no reasons to keep it), it may be
useful to start deprecation process (at least for the use in G1) when
SoftMaxHeapSize is in.
There are some more issues with heap sizing not really relevant to this
discussion, I need to think about them a bit more and file appropriately
worded CRs.
Either way, what do you think about my suggested change? Can you try it
on your workloads to see if it could do the job? Any other comments?
More work is needed on this patch I think; also we might need to think
about how the user can detect this change of the target better in the
logs for troubleshooting.
The original patch (webrev.2) also contained some minor unrelated
cleanups (one constification of a method, one rename of the heap
resizing phase) that might be easier to address separately more quickly ;)
Thanks,
Thomas
[0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty
-Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000
-Dspecjbb.controller.preset.duration=10800000"
VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication
This gives ~1.5GB live set size, on my machine around 10-40ms pause
time, so rather light load at least without setting any heap size goal;
in my runs, G1 settles to around 3.8GB of committed heap. (with
Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into
the VM startup options too)
[1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png
[2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/
[3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png
More information about the hotspot-gc-dev
mailing list