RFR(L): 8196989: Revamp G1 JMX MemoryPoolMXBean, GarbageCollectorMXBean, and jstat counter definitions

Fri Aug 3 11:37:13 UTC 2018

Hi Paul,

On Thu, 2018-08-02 at 18:14 +0000, Hohensee, Paul wrote:
> Inline.
> 
> On 8/1/18, 12:03 PM, "Thomas Schatzl" <thomas.schatzl at oracle.com>
> wrote:
> 
>     Hi,
>     
>     On Mon, 2018-07-30 at 19:18 +0000, Hohensee, Paul wrote:
> > At JVMLS, so can't look in depth this instant, but I'm fine 
> > with your approach, except I'd get the new JMX and jstat 
> > structure in place before fixing the data that gets reported. 
> > Imo it'll be easier to fit correct data into the new JMX/jstat 
> > setup than into the old one, and doing it the new way will 
> > give us a good idea of exactly what we should do for the 
> > legacy ones.
>     
>     What I want to first do is polishing the existing code a bit so
> that changes to the jmx/jstat structures does not touch
> G1CollectedHeap code as much as possible, and remove cruft.
> 
> Good.
>     
>     > Some of these refactorings are interesting for other somewhat 
>     > unrelated improvements too :)
> 
> What are those? :)

Most of them bugfixes, the mentioned JDK-8208498 and others that have
been in the problem list for ages due to G1 hacks that only "mostly"
work.

I am thinking of polishing up some eden region allocation path
improvements that have been lying around for a while here too as I am
likely going to touch that code anyway.

> > Before that is finished, we can meanwhile figure out what values we
> > want in the MXBeans too...
>     
> Which brings me back to the question of which pools will there be in
> the "new" mxbeans, what their contents could be, and the relations
> between the returned values (i.e. consistency).
>     
> So my question has been, what do you think about the following
> additions to the scheme proposed in the CSR for the "new" set of
> pools:
>     
>      - remove the "to" survivor space pool
>      - add a "free regions" pool
> 
> There's only one survivor space MemoryPool right now, so that doesn't
> change. Adding a free regions pool seems good to me, but I'm not sure
> how to define its content. Call it "G1 Free Space". 'committed' and
> 'used' would be the same value, just as for humongous and archive,
> and got from the free region set. 'init' would be (total committed
> for all regions - all the other memory pool 'init' values), 'max'
> would be -1 (undefined).

One could set "used" in the "G1 Free Region Pool" to always zero, which
is kind of true :)

I do not see a reason to set "used" and "committed" equal in archive
and humongous regions either. Why would you want to lie to the user
about that?

> > (from the other email):
> A possible problem with having a free space memory pool is that 
> there's no equivalent for other collectors. If you do the usual use 
> pattern of adding up all 'used' fields to get a 'total used' number, 
> if you have a free space memory pool, you'll get a non-comparable-to-
> other-GCs answer, and break existing collector-generic code.

Unless "used" for this pool is always zero, which is true. There are no
objects in it.

More about summing up values below.

> There's a 'to space' jstat counter which is explicitly disabled in G1
> (always reads zero and is never updated). I assume this is for
> compatibility with existing jstat usage, if so I'd leave it alone,
> which I've done in my proposed patch.
>      
> > There should probably be a third change to the pools in the CSR: in
> > G1 it is possible that there are no regions for memory areas that
> > are actually committed iirc. This is because region size can be
> > differentto page size - just consider 1G pages. G1 at 
> > G1CollectedHeap level may ask the HeapRegionManager to uncommit a 
> > certain amount of memory due to shrinking, but the 
> > HeapRegionManager may not find a contiguous set of regions that 
> > fits full 1G pages.
> >
> > What do you think about a "logically uncommited" pool?
> > (I need to check how HeapRegionManager works again, but I think
> > there is the possibility to have such)
> 
> This seems like it'd be a bit confusing for the average user. If the
> HeapRegionManager can't actually uncommit the memory, then from the
> user's pov it really is committed (i.e., not available to other
> processes).

That's true. But it is not available for other memory pools at the
moment either.

> > About returned value consistency:
> >   
> > - it seems that the best we can achieve for MemoryPools that the
> > values within a single MemoryUsage after a call to getUsage() are
> > consistent.
> 
> Yes.
>     
> > - values of further calls to get the MemoryUsage of a different 
> > pool may contradict the first result.
> 
> Yes, if there's been a call to recalculate_sizes() in between. Given
> that there's usually a few seconds between gc events that cause that,
> if the JMX user grabs all the memory pool values at once, they're
> effectively always consistent.
>     
> >  - a corollary of that is that the sum of values of a particular 
> > members (basically all of them) of MemoryUsage may not add up to
> > particular values. E.g. the sum of the "max" fields will not sum 
> > up to -Xmx.
> 
> Yes. That's why the 'max' values are all -1 (undefined), except for
> the old space, which is -Xmx. So they do add up to -Xmx by
> definition.

This is buggy code, plain and simple. "Undefined" does not mean "zero";
and simply adding up these values does not give the correct value
anyway (I understand that then people start just ignoring "undefined"
in their loops).

The documentation of max in MemoryUsage actually has the sentence "The
maximum amount of memory may change over time if defined" in it.

If there isn't another defined way in the Java API (I think there may
actually be) it is possible to query the current "MaxHeapSize" option
value (which is also available since forever afaik).

> > - only some values of MemoryUsage of the same pool are guaranteed 
> > to always be <= or >= a previous value between two GC induced
> > safepoints (that includes Remark and Cleanup as we may free memory 
> > there). E.g. used() does, but others may not.
> >
> > - no consistency guarantees for other ways of getting memory sizes
> >
> 
> Yes and yes, but see above. They’re 'almost always consistent'.

No, they are not. Recalculate_sizes() (and the equivalent in other
collectors) may be called at any time, eg. humongous object (or old gen
allocation in other collectors) allocation. That might change used and
capacities of the pools at any time (and max capacities).

So adding together any of these values to get totals is not the way
these values were intended to be used. The only way to get
consistent(*) totals is the public getMemoryUsage() call to the
MemoryMXBean (via
java.lang.management.ManagementFactory.getMemoryMXBean() call) afaics.

Ignore that comment indicating that in recalculate_sizes(). That
comment is just wrong, and has never been true.

I would not like to perpetuate these wrong assumptions given a
perfectly fine API call available since at least Java 7 (and I assume
it had been introduced much earlier).

(*) Actually the current implementation does not guarantee consistency
of that MemoryUsage either...

>     About expected values:
>     
> > - the only pool with "init" size > 0 will be the "Free region" 
> > pool (and potentially the "logically uncommitted" one).
> 
> Depends on when you create the G1MemoryPool objects relative to when
> you call recalculate_sizes() for the first time. I put the latter as
> the first thing in initialize_serviceability(), so some 'init' fields
> may be non-zero, e.g., the archive pool.

You are right about the archive pool potentially having an initial
size.

> > - "used" values will not include filler objects like in the other
> > collectors.
> 
> Right now, all values are multiples of the region size, because it'd
> be a lot of overhead to calculate partial use.

It isn't actually, from what I can see just an additional atomic add
during TLAB allocation. It does not significant restructuring of the
region allocation code though :)

> You can also argue that from the point of view of a user worrying
> about tracking memory pool size, region size granularity is good
> enough.

However code snippets on the web (and our own tests!) think otherwise,
so it would be nice to have them fixed...

> > - "max" is basically always the maximum heap capacity for all 
> > pools.
> > (Probably it is more interesting to use "current" maximum heap
> > capacity, not absolute maximum, i.e. the -Xmx value?)
> 
> The way people use the memory pools is typically to just add up the
> max values to get -Xmx, because they have code that works with all
> collectors. We don't want to break that invariant.

See above. That's just wrong usage. Such code was not correct even with
serial gc ever in the general case.

Obviously because between these readings there could be a gc (and I
have seen code working around this using loops to check if gc happened
inbetween).
Depending on memory ordering capacity/used values might not be
consistent in the java thread after commit of new memory too, which is
probably the more overlooked one.

>     The values in the compatibility pools will be some sum of the new
> ones.
> 
> Yes. The legacy value of the old pool 'used' is the sum of the new
> 'old', 'humongous' and 'archive' pools.
>     
> > > Your archive region set webrev looks pretty much the same as what
> > > I wrote, but I got a trace trap when I tried to execute the
> > > resulting JVM. Not a clue why, so I abandoned it.
> >
> >    Can you describe this trace trap a bit more? On OSX, in debug?
> 
> It happened during the build when the new JVM tried to execute. I've
> merged my old repo with jdk tip and am running a build to try again.
> The webrev (some cruft in it) is at http://cr.openjdk.java.net/~phh/8
> 196989/webrev.01/.
>  

Thanks,
  Thomas