Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Wed Jan 15 12:53:20 UTC 2020

Hi Thomas,

So G1 doesn't need to shrink below Xms if SoftMaxHeapSize is 
below Xms, does it?

Another question is that no matter we have an additional option we 
had better have 2 criterions. The first is for urgent expansion that
 GCTimeRatio is quite low and concurrent expansion with frequent GCs 
is more harmful and expansion should be done immediately. It's the current
default flow as we found that 12 is a good number below which applications
can obviously incur timeout errors. The second is to keep the GCTimeRatio
 and memory footprint in a balanced state so any adjustments are better
 to be concurrent. The original number 99 fits well here.
 If we have only one option "GCTimeRatio", we might not be able to
achieve both. Maybe we can have a LowGCTimeRatio below which suppose to be
not acceptable and a HighTimeRatio which is certainly healthy. 

Thanks,
Liang

------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Jan. 15 (Wed.) 19:44
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 15.01.20 11:58, Liang Mao wrote:
> Hi Thomas,
> 
>>> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms
>>> The constraint function didn't make sure the SoftMaxHeapSize should
>>> less than Xms. Do we need to add the checking? It will not only
>>> affect G1...
> 
>> I will check again later, but from what I remember from yesterday it
>> does check it at VM start (-Xms sets both minimum and initial heap
>> size). The constraint func does not check when the user changes the
>> value during runtime. So code using it must still maintain this
>> invariant in behavior.
> 
> The default constraint function will be both checked in VM startup
> and during runtime via jinfo. By looking into the code, ZGC seems
> to allow SoftMaxHeapSize less than Xms. So do we need to create
> another mail thread to discuss it?

Colleagues mentioned that ZGC allows setting SoftMaxHeapSize below 
MinheapSize, but does not uncommit memory below it.

I do not see a problem for allowing the user set SoftMaxHeapSize below 
MinHeapSize so it may have limited use. If jinfo prevents this too, then 
it seems that the code can assume that SoftMaxHeapSize is within 
Min/MaxHeapSize.

> 
>>> 4. commit/uncommit parallelism
>>> The concurrent uncommit will work with VMThread doing GC and GC may
>>> request to expand heap if not enough empty regions. So the
>>> parallelism is possible and immediate uncommit is a solution.
> 
>> There may be others, but it actually seems easiest as blocking such a
>> request seems actually harder to implement, at least it's less
>> localized in the code. Completely *dropping* the request seems against
>> the rule that "SoftMaxHeapSize is a hint" guideline and may have other
>> unforeseen consequences too. Like I said, since G1 does not expand
>> then, there will be more GCs with the small heap, increasing the
>> current GCTimeRatio more than it should. Which means when ultimately
>> the request comes through as G1 will certainly try again, the increase
>> may be huge. (The increase is proportional to the difference in actual
>> and requested GCTimeRatio iirc).
> 
>> Again, if there are good reasons to do otherwise I am open to
>> discussion, but it would be nice to have numbers to base decisions on.
> 
> I'm not on the side of blocking the expand request:)
> G1RegionsLargerThanCommitSizeMapper can do uncommit/commit
> parallelly and G1RegionsSmallerThanCommitSizeMapper
> can do uncommit/commit immediately. So I think we don't have issues
> so far?

:)

> 
>>> So it is kind of *hard* limit and we need to expand immediately if
>>> GCTimeRatio drops below 12. The difference in our workloads is that
>>> we will keep a GCTimeRatio nearly the original value 99 to make GC in
> 
>>I.e. you set it to 99 at startup?
> 
> In fact we are not controlling GCTimeRatio. In a lot of applications
> running in exclusive containers we set Xms same to Xmx to avoid
> any heap expansion during runtime which might cause allocation
> stalls and timeout.

Okay.

> 
>>> I propose if we can still use the original option
>>> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified
>>> number. The default flow will make sure the GCTimeRatio is above the
>>> threshold 12 and concurrent commit/uncommit will adjust the heap to
>>> keep GCTimeRatio in a proper number that the adjustment is not
>>> urgent.
> 
>> I am not completely sure what you want to achieve here or what the
>> problem is. I probably need to understand more about the problem and
>> potentially other solutions can be found.
> 
>> As for a new -XX:+G1ElasticHeap option, it does not seem to make a
>> difference to set this or -XX:GCTimeRatio in this case (both are single
>> options). But I do not completely know the details here.
> 
> Theoretically Java heap will not return memory in default and
> ZGC/Shenandoah have options to control by "ZUncommit" and 
> "ShenandoahUncommit"
> to info user that memory can be uncommit... So I think G1 needs
> the same thing as well. In my opinion, here are 2 espects. The

G1 uncommits unused memory by default since a long time ago. There is no 
flag to disable this behavior except setting -Xms == -Xmx. The policies 
when are also different (using Min/MaxHeapFreeRatio) compared to other 
collectors.

However only lately (JDK12 or 13) it does so at the end of the Remark 
pause - earlier it only did so after full gc.

The changes provided also enable shrinking of the heap during most young 
GCs.

It may be a problem that full gcs (including "concurrent full gc") and 
young gcs use a different policy btw as occurred to me yesterday after 
sending the email. That's something to explore.

> default value of GCTimeRatio is the basic line so we might
> need to expand immediately to avoid frequent GCs if using
> concurrent flow. But the G1ElasticHeap is an optimization
> to keep the balance of GC health and memory utility so the
> policy should be more conservative and we also need to do it
> concurrently by not bringing any obvious pause overhead.
> 

Changing GCTimeRatio to a higher value should improve the response time 
on memory needs. The changes provided by you are also going to fix the 
concurrent (un-)commit.

Thanks,
   Thomas