RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Wed Feb 5 08:13:57 UTC 2020

Hi Liang,

   apologies for the late reply - I did look at the patch immediately 
after you posted it, but initial tests showed that it does not work as 
(I) expected. More about that below. So I went ahead and hack up 
something that comes closer to what I had in mind. Unfortunately other 
more urgent issues came up, which caused the delay on this work. Sorry. 
(And sorry for the long post).

Not having any kind of workload to work with for testing the change I 
used some configuration of specjbb2015 with fixed ir [0] (taken from a 
colleague's unrelated recent internal test), simulating a constant load 
the user wants to control the heap usage of.

In this situatio I want to apologize to use specjbb2015 for this public 
reply because it not openly available, but I only noticed when writing 
up this email. Finding a substitute and redoing measurements would 
probably take more time. I will start looking into this issue.

Anyway, in my test scenario, after warmup, the user tries to first limit 
the heap to 2GB, and after a while to 3GB, and then back to 8GB.

The resulting graph [1] shows heap metrics over time: blue ("soft") is 
the current SoftMaxHeapSize, pink ("committed") represents committed 
memory, yellow ("goal")  shows G1's current heap size goal, turquoise 
("free") the amount of free heap and purple ("used") the amount of used 
memory.

Ignoring the drop from ~second 30-100 where I finally managed to set 
Min/MaxHeapFreeRatio ;) you can see  that G1 kind of stabilizes at 
around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to 
2GB. As you can see, G1 ignores the request. This corresponds to the 
code where apparently the heap is only reduced to SoftMaxHeapSize if 
there is enough free space to reduce to that value (I think).

At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected 
drop in memory usage. However, since the change does not modify G1 goals 
it ultimately just ignores the SoftMaxHeapSize goal. It probably worked 
if there were no further application activity.

I created a webrev of an alternative attempt that modifies G1's 
goal/target heap size in the adaptive IHOP mechanism so that G1 
automatically starts marking so that a space reclamation phase starts 
before reaching softmaxheapsize. It basically changes the predictor's 
reserve according to current committed heap size not only based on 
G1ReservePercent, but also on the specified SoftMaxHeapSize.

One complication in a generational setting is to adapt young gen 
(particularly survivor size) to that goal too, but I think the change 
does okay with that.

However it is not finished yet, there is debugging code in it and one 
FIXME that is about shuffling around code properly.

In the graph at [3] you can see the results, with same metrics shown. In 
this case G1 fairly well follows the soft goal.

For the 2g softmaxheapsize goal it works perfectly in the example (*1), 
in the 3g softmaxheapsize change we get some initial short overshoot in 
committed memory. (*2/*3)

There are however some problems/differences to your solution here which 
need to be discussed a bit more to see if it fits you and ultimately 
make it perform better:

*0 this change uses existing sizing to uncommit memory, i.e. memory is 
not uncommitted immediately but part of regular operation. This means 
that the garbage collection cycle needs to advance. In case of specjbb 
with fixed IR this is no issue, but completely quiescent applications 
need other mechanisms like the "Promptly Return Unused Committed Memory 
(JEP 346) feature enabled. Some tuning is needed in that mechanism for 
almost-idle applications.

*1 the problem with only setting SoftMaxHeapSize and relying on the 
regular uncommit mechanism is that due to other reasons, e.g. 
GCTimeRatio, G1 won't achieve this kind of compact heap. This is the 
reason why my setup includes the GCTimeRatio=4 on the command line - 
otherwise in neither case G1 would achieve the 2g goal (it would settle 
around 3g with my changes, didn't test the original changes; max heap 
usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify 
it during runtime (i.e. when you want to select a different 
throughput/latency tradeoff to achieve lower heap usage).

*2 looking at the results more closely the (first) overshoot in the 3g 
soft max heap size goal, I think this is a remaining issue in the heap 
sizing policy in conjunction with soft max heap size, i.e. temporarily 
the target gctimeratio is set to 10% for various reasons. (in 
G1HeapSizingPolicy::expansion_amount()).

In the log I have, the problem seems to be that we are re-setting the 
softmaxheapsize within the space reclamation phase (i.e. mixed gc) and 
G1 sizing policies got confused, i.e. it partially keeps on using the 2g 
goal for young gen sizing until the *2 problem expands it. That's a bug 
and needs to be fixed.

So far previous text only looked at the best case where everything fits 
together; there are some other issues which will prevent you from 
achieving a tight heap in some cases that I noticed during my testing. 
Something to think about.

*4 GCTimeRatio/heap expansion during young gc has different goals than 
the (un-)commit at the end of full gc. In some cases, with 
SoftMaxHeapSize (but also without), the later will undo the expansion at 
young gc, which will immediately start to expand again.

*5 GCTimeRatio can't be adjusted during runtime, which means that you 
won't achieve that tight of a heap as in this example. GCTimeRatio is 
also a bit unwieldy to use, i.e since it is the denominator in the 
(default; nobody sets GCPauseIntervalMillis) time calculation, you get 
"good" granularity of low values, but pretty bad granularity of high values.

*6 Min/MaxHeapFreeRatio default values are probably too high - with 
adaptive IHOP, G1 can typically meet its current goal very well, any 
excess is often just wasted committed memory. A similar issue to that 
is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent, 
i.e. the default reserve for the IHOP. In this case there will be 
significant memory commit/uncommit pauses.

Here is my question to you (and any readers), are you using 
Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size 
seems to be much more direct and better than Min/MaxHeapFreeRatio. Given 
above (and assuming that there are no reasons to keep it), it may be 
useful to start deprecation process (at least for the use in G1) when 
SoftMaxHeapSize is in.

There are some more issues with heap sizing not really relevant to this 
discussion, I need to think about them a bit more and file appropriately 
worded CRs.

Either way, what do you think about my suggested change? Can you try it 
on your workloads to see if it could do the job? Any other comments?

More work is needed on this patch I think; also we might need to think 
about how the user can detect this change of the target better in the 
logs for troubleshooting.

The original patch (webrev.2) also contained some minor unrelated 
cleanups (one constification of a method, one rename of the heap 
resizing phase) that might be easier to address separately more quickly ;)

Thanks,
   Thomas

[0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty 
-Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000 
-Dspecjbb.controller.preset.duration=10800000"
VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication

This gives ~1.5GB live set size, on my machine around 10-40ms pause 
time, so rather light load at least without setting any heap size goal; 
in my runs, G1 settles to around 3.8GB of committed heap. (with 
Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into 
the VM startup options too)

[1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png

[2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

[3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png