RFR (M): 8212657: Implementation of JDK-8204089 Timely Reduce Unused Committed Memory

Sat Nov 17 08:25:31 UTC 2018

Hi,

Thanks for the response! We discussed about this internally and our
consensus is that we will gather more performance data from production
workload and compare the tradeoff and effectiveness of the two
approaches:
(a) this JEP: setting Xms != Xmx and periodically triggering GC;
(b) our local features: setting Xms = Xmx, and calling MADV_DONTNEED
on some free regions, and use mutator utilization to trigger
additional GCs.

Fundamentally, they are solving the same problem with different
approaches, and with different tradeoff focuses.
We are glad to see this JEP happening and trying to solve the problem,
but we need to check if it satisfies our need.

Currently we are working on migrating users to JDK11 and G1, so it
will take a while before we can collect such production performance
data.

Other responses and context below.

> By default, -Xms == -Xmx should just mean that, MADV_DONTNEED may
> introduce unnecessary latency that would certainly be unexpected.
> For the same reason I closed JDK-8196820 as WNF recently.
> It could certainly be useful to use madvise to "uncommit" memory within
> the usual heap sizing (with -Xms != -Xmx, see JDK-8210709).

Thanks for the pointers. I wasn't aware JDK-8196820 exists. Its link
to Hiroshi's patch in 2013 is indeed our implementation for CMS.
I can take a stab at JDK-8210709, as our local feature has already
implement calling MADV_DONTNEED on an address range.
I'll also take a look at how Shenandoah calls MADV_DONTNEED.

> There will be a lot of resistence to make -Xmx==-Xms behave as you
> suggest (in the default case...), and it seems that the problem in your
> case is improper heuristics for -Xms in some (many?) cases which seems
> to be acknowledged above.

If the GC ever calls MADV_DONTNEED for Xms = Xmx, it will be guarded
by a new flag. This flag should be turned off by default. In our local
feature, the flag is called DeallocateHeapPages. I suspect this would
require another JEP.

> I am still not sure what the problem is with -Xms != -Xmx, or what
> -Xmx==-Xms with following uncommit solves. It is hard to believe for me
> that setting -Xms to -Xmx is easiest for an end user - I would consider
> not setting -Xms easiest...
> Maybe doing so improves startup time where often it is advantageous to
> have a large eden to get the long-lived working set into old gen
> quickly? Maybe some "startup boost" for heap sizing/some options would
> help here much better?

Almost all GC tuning guidelines for server applications recommend
setting Xms = Xmx.
For example:
https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B
https://docs.oracle.com/middleware/12213/wls/PERFM/jvm_tuning.htm#PERFM160
Thus most production services have set them to the same value.

>From experience with CMS, setting a smaller Xms would increase startup
time and GC overhead after startup.
CMS could shrink and re-expand the heap over and over, causing
unnecessary GC overhead.
Basically the extra memory saving hardly justifies the extra GC overhead.
The DeallocateHeapPages feature strikes a better balance between
memory saving and overhead for reusing pages marked as MADV_DONTNEED.

Perhaps the situation in G1 is better than in CMS, that Xms != Xmx
does not cost much more GC overhead?
Probably we'd need JDK-6490394 backported to JDK11 to have more memory
saving for production services running JDK11?
If Xms != Xmx and this JEP addresses the memory saving and GC overhead
balance, we are happy to advise users set a smaller Xms or not set it
at all for G1, and deprecate DeallocateHeapPages.

> That is a very non-standard way of defining mutator utilization, but
> some of the terms are not clearly defined :)
> From what I understand, the formula in the end just reduces to periodic
> old gen collections regardless of other activity (e.g. it does not take
> minor gc into account apparently).
> ...

Discussion for mutator utilization will probably get initiated in a
separate thread after we collect more production performance data.
Wessam will be a better contact for mutator utilization.

> > It is orthogonal to G1UseAdaptiveIHOP to control when to start a
> > concurrent cycle. We also found it is useful to reduce GC cost in
> > production workload by setting a higher minimum bound to prevent
> > concurrent cycles.
> I did not get that paragraph, you need to explain this in more detail
> :)
Mutator utilization considers frequency of concurrent collections,
rather than heap occupancy.
The second sentence is basically a case for this previous sentence:
"If mutator utilization is too low (e.g., <40%), it can be used to
prevent concurrent collection from happening."
Concurrent collections could be too frequent or wasteful, for example
JDK-8163579, and mutator utilization can prevent such cases.