RFR (S): 8195115: G1 Old Gen MemoryPool CollectionUsage.used values don't reflect mixed GC results

Hohensee, Paul hohensee at amazon.com
Fri Feb 16 20:25:40 UTC 2018


The CSR https://bugs.openjdk.java.net/browse/JDK-8196719 for the original fix has been approved, so I’m back to requesting a code review, please.

Bug: https://bugs.openjdk.java.net/browse/JDK-8195115
Webrev: http://cr.openjdk.java.net/~phh/8195115/webrev.hs.01/

Passed a submit repo run, passes its jtreg test, and a JDK8 version is in production use at Amazon.

From the original RR:

    > The bug is that from the JMX point of view, G1’s incremental collector
    > (misnamed as the “G1 Young Generation” collector) only affects G1’s
    > survivor and eden spaces. In fact, mixed collections run by this
    > collector also affect the G1 old generation.
    >
    > This proposed fix is to record, for each of a JMX garbage collector's
    > memory pools, whether that memory pool is affected by all collections
    > using that collector. And, for each collection, record whether or not
    > all the collector's memory pools are affected. After each collection,
    > for each memory pool, if either all the collector's memory pools were
    > affected or the memory pool is affected for all collections, record
    > CollectionUsage for that pool.
    >
    > For collectors other than G1 Young Generation, all pools are recorded as
    > affected by all collections and every collection is recorded as
    > affecting all the collector’s memory pools. For the G1 Young Generation
    > collector, the G1 Old Gen pool is recorded as not being affected by all
    > collections, and non-mixed collections are recorded as not affecting all
    > memory pools. The result is that for non-mixed collections,
    > CollectionUsage is recorded after a collection only the G1 Eden Space
    > and G1 Survivor Space pools, while for mixed collections CollectionUsage
    > is recorded for G1 Old Gen as well.
    >
    > Other than the effect of the fix on G1 Old Gen MemoryPool.
    > CollectionUsage, the only external behavior change is that
    > GarbageCollectorMXBean.getMemoryPoolNames will now return 3 pool names
    > rather than 2.
    >
    > With this fix, a collector’s memory pools can be divided into two
    > disjoint subsets, one that participates in all collections and one that
    > doesn’t. This is a bit more general than the minimum necessary to fix
    > G1, but not by much. Because I expect it to apply to other incremental
    > region-based collectors, I went with the more general solution. I
    > minimized the amount of code I had to touch by using default parameters
    > for GCMemoryManager::add_pool and the TraceMemoryManagerStats constructors.

Thanks,

Paul

On 2/12/18, 2:30 PM, "serviceability-dev on behalf of Hohensee, Paul" <serviceability-dev-bounces at openjdk.java.net on behalf of hohensee at amazon.com> wrote:

    Thanks, very much appreciated.
    
    Paul
    
    On 2/12/18, 7:42 AM, "Erik Helin" <erik.helin at oracle.com> wrote:
    
        Hi Paul,
        
        sorry, I'm just back from a really busy week with two conferences. I 
        will take a look tomorrow.
        
        Thanks,
        Erik
        
        On 02/07/2018 11:17 PM, Hohensee, Paul wrote:
        > I’ve filed https://bugs.openjdk.java.net/browse/JDK-8196989 : Revamp G1 JMX MemoryPool and GarbageCollector MXBean definitions and the corresponding CSR https://bugs.openjdk.java.net/browse/JDK-8196991.
        > 
        > Would you please comment on the CSR, and on the original CSR https://bugs.openjdk.java.net/browse/JDK-8196719?
        > 
        > Thanks,
        > 
        > Paul
        > 
        > On 2/2/18, 1:20 PM, "hotspot-gc-dev on behalf of Hohensee, Paul" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of hohensee at amazon.com> wrote:
        > 
        >      And, can a get a reviewer or reviewers for the CSR?
        >      
        >      Thanks,
        >      
        >      Paul
        >      
        >      On 2/2/18, 1:14 PM, "Hohensee, Paul" <hohensee at amazon.com> wrote:
        >      
        >          +hotspot-gc-use.
        >          
        >          I’ve filed a CSR for the current patch, see https://bugs.openjdk.java.net/browse/JDK-8196719. Here’s the argument in favor.
        >          
        >          
        >          It’s possible that there are JDK8 users that rely on current G1 old gen CollectionUsage behavior, but imo it’s unlikely because it’s of little use. Perhaps Kirk and others with operational experience can weigh in.
        >          
        >          Let’s think about use cases. G1 full GC’s happen rarely and only under severe pressure, so when they do external reaction is pretty much limited to reducing load so the JVM can get back to a usable steady state, or just restarting the JVM. Old gen CollectionUsage is zero until a full GC occurs, after which its value includes both long-lived objects and any transient data that was in eden and the survivor space. That value doesn’t tell you anything about long term old gen occupancy or survivor size because it lumps them all together. So, it isn’t a useful metric, nor will it be after any subsequent full GCs. The only information it provides is on the first zero to non-zero transition, which just tells you that the JVM is/was in trouble. Further, the effect of the runup to a full GC is SLA violations, which are noticed before the full GC happens, so detecting the first full GC is confirmation, not prediction.
        >          
        >          Conclusion: G1 old gen CollectionUsage is unlikely to be in use in its current form, so changing its definition to something usable is low risk.
        >          
        >          
        >          “G1 Old Space” is fine, as is “G1 Archive Space”. Are you ok with the G1 archive space overlapping the G1 old space? Should we add an archive space to the other collectors? If so, how would it be defined and would having it overlap with the old generation as a live prefix be ok?
        >          
        >          "G1 Young Generation" is the currently young+mixed collector.
        >          
        >          You’re right, if one is iterating over all collectors, there will be redundancy if we keep the old ones. I’m usually leery of introducing a flag such as UseG1LegacyMXBeans (I changed the name, since all the interfaces are MXBeans, hope that’s ok) which must be either indefinitely maintained, or go through a deprecation cycle. I don’t see a way out of the ‘iterate over all collectors’ problem without it though.
        >          
        >          Paul
        >          
        >          On 1/31/18, 3:42 AM, "Erik Helin" <erik.helin at oracle.com> wrote:
        >          
        >              On 01/31/2018 02:30 AM, Hohensee, Paul wrote:
        >              > It’s true that my patch doesn’t completely solve the larger problem, but it fixes the most immediately important part of it, particularly for JDK8 where current expected behavior is entrenched.
        >              
        >              Yes, your patch fixes part of the problem, but as I said, can
        >              potentially lead to more confusion. I'm not sure that doing this
        >              behavioral change for a public API in an JDK 8 update release is the
        >              right thing. There are likely users that rely on the memory pool "G1 Old
        >              Gen" only being updated by a full collection (even though that behavior
        >              is not correct), those uses will encounter a new behavior in an update
        >              release with your patch.
        >              
        >              The good thing is that we have very experienced engineers participating
        >              in the CSR process that have much more experience than I have in
        >              evaluating the impact of behavioral changes such as this one. Would you
        >              please file a CSR request for your patch to get their opinion?
        >              
        >              See https://wiki.openjdk.java.net/display/csr/Main for more details
        >              about CSR.
        >              
        >              On 01/31/2018 02:30 AM, Hohensee, Paul wrote:
        >              If we’re going to fix the larger problem, imo we should file another
        >              bug/rfe to do it. I’d be happy to fix that one too, once we have a spec.
        >              >
        >              > What do you think of my suggestions? To summarize:
        >              >
        >              > - Keep the existing memory pools and add humongous and archive pools.
        >              > - Make the archive pool part of the old gen, and generalize it to all collectors.
        >              > - Split humongous regions off from the old gen pool into their own pool. The old gen and humongous pools are disjoint region sets.
        >              > - Keep the existing “G1 Young Generation” and “G1 Old Generation” collectors and their associated memory pools (net of this patch). We add the humongous pool to them.
        >              > - Add “G1 Full” as an alias of the existing “G1 Old Generation” collector.
        >              > - Add the “G1 Young”, “G1 Mixed” and “G1 Concurrent Cycle” collectors.
        >              > - Set the G1 old gen memory pool max size to –Xmx, the archive space max size to whatever it is, and the rest of the G1 memory pool max sizes to -1 == undefined, as now.
        >              >
        >              > The resulting memory pools:
        >              >
        >              > “G1 Eden Space”
        >              > “G1 Survivor Space”
        >              > “G1 Old Gen”
        >              > “G1 Humongous Space”
        >              > “Archive Space”
        >              
        >              The "Space" suffix is unfortunate, but acceptable. I'm least happy about
        >              the "Gen" suffix for the "G1 Old Gen", since G1's old regions differ
        >              from a generation in the traditional sense as applied to e.g. Serial,
        >              Parallel and CMS. I would be more happy to use a consistent naming
        >              scheme in the form of "G1 Old Space" (having only one pool ending "Gen"
        >              begs the question how it differs from the others ending in "Space").
        >              Again, we could introduce a flag such as -XX:+UseG1LegacyPoolsAndBeans
        >              for those that really wants the old names.
        >              
        >              "Archive Space" should be named "G1 Archive Space" since it differs in
        >              implementation from the other collectors. It would be unfortunate if
        >              users thought they could change collector and the "Archive Space" memory
        >              pool would keep the same behavior.
        >              
        >              > The resulting collectors and their memory pools:
        >              >
        >              > “G1 Young Generation” (the existing young/mixed collector), “G1 Old Generation”/”G1 Full”, “G1 Mixed”
        >              > - “G1 Eden Space”
        >              > - “G1 Survivor Space”
        >              > - “G1 Old Gen”
        >              > - “G1 Humongous Space”
        >              > “G1 Young”
        >              > - “G1 Eden Space”
        >              > - “G1 Survivor Space”
        >              > - “G1 Humongous Space”
        >              > “G1 Concurrent Cycle”
        >              > - “G1 Old Gen”
        >              > - “G1 Humongous Space”
        >              >
        >              > I’m not religious about the names, but I like my suggestions. :)
        >              I think it will be confusing for users to have both "G1 Old Generation"
        >              and "G1 Full", particularly for tools iterating over all
        >              GarbageCollectorMXBeans. There is no way to indicate that a
        >              GarbageCollectorMXBeans is an alias of another GarbageCollectorMXBean (I
        >              thought about such a solution as well).
        >              
        >              I guess I don't follow what the GarbageCollectorMXBean "G1 Young
        >              Generation" is meant to represent?
        >              
        >              > The significant addition to my previous email, and an incompatible change, is splitting humongous regions off from the old gen pool. This means that apps that specifically monitor old gen occupancy will no longer see humongous regions. Monitoring apps that just add up info about all a collector’s pools won’t see a difference. I may be corrected (by Kirk, perhaps), but imo it’s not as bad a compatibility issue as one might think, because the type of app that uses a lot of humongous regions isn’t all that common. E.g., apps that cache data in the heap in the form of large compressed arrays, and apps with large hashmap bucket list arrays. The heaps such apps use are very often large enough to use 32mb regions, hence need really big objects to actually allocate humongous regions.
        >              
        >              So why not enable backwards compatibility by allowing a user to set the
        >              flag -XX:+UseG1LegacyPoolsAndBeans? It is not that cumbersome for us to
        >              maintain the current definition of memory pools and collectors. Such a
        >              flag allows us to start over and do this right and a user who relies on
        >              the current behavior can get that by just setting a flag. Doing such a
        >              change in a major release also allows us to clearly highlight the change
        >              in the release notes (users are more prepared for larger changes in a
        >              major release and that they might have to add flags to keep old behavior).
        >              
        >              It is not uncommon for memory pools to change in major releases. The
        >              perm gen pool was removed in JDK 8, the default pools changed when
        >              Parallel Old became default old collector way back in JDK 7 and changed
        >              again when G1 became the default collector in JDK 9.
        >              
        >              Thanks,
        >              Erik
        >              
        >              > Thanks,
        >              >
        >              > Paul
        >              >
        >              > On 1/30/18, 5:51 AM, "Erik Helin" <erik.helin at oracle.com> wrote:
        >              >
        >              >      On 01/30/2018 03:07 AM, Hohensee, Paul wrote:
        >              >      > That’s one reviewer who’s ok with a short term patch. Anyone else? And,
        >              >      > any reviewers for said short term patch? :)
        >              >
        >              >      Well, the patch is not really complete as it is. The problem is the
        >              >      definitions of the MemoryPoolMXBeans and GarbageCollectorMXBeans, which,
        >              >      as I tried to hint at in my first email, is a mess for G1. The names and
        >              >      implementations of these MemoryPoolMXBeans and GarbageCollectionMXBeans
        >              >      for G1 are very old, G1 has changed a lot since those were implemented
        >              >      (hence my suggestion for finally fixing this).
        >              >
        >              >      The issue with your patch is that the MemoryPoolMXBean named "G1 Old
        >              >      Gen" consists of both old and humongous regions (it will also include
        >              >      archive regions). Old regions can be collected by mixed, concurrent and
        >              >      full collections. Humongous regions can be collected by young, mixed or
        >              >      full collections and the concurrent cycle. Archive regions will never be
        >              >      collected. Your patch will update the pool in the case of a mixed
        >              >      collection collecting old regions or humongous regions, but misses the
        >              >      following cases:
        >              >      - a young collection collecting humongous regions
        >              >      - a concurrent cycle collecting humongous regions
        >              >      - a concurrent cycle collecting old regions
        >              >
        >              >      Unfortunately I could not come up with a good way to solve the above
        >              >      without re-designing the pools. I'm not sure about accepting your patch
        >              >      as is, since it might cause even more confusion for users compared to
        >              >      the current (already confusing) situation.
        >              >
        >              >      One idea we have discussed is to implement the re-design but also add a
        >              >      flag, -XX:+UseG1LegacyPoolsAndBeans (false by default), to allow for a
        >              >      smoother transition. Would that solution work for you?
        >              >
        >              >      Thanks,
        >              >      Erik
        >              >
        >              >      > Thanks,
        >              >      >
        >              >      > Paul
        >              >      >
        >              >      > *From: *mandy chung <mandy.chung at oracle.com>
        >              >      > *Organization: *Oracle Corporation
        >              >      > *Date: *Monday, January 29, 2018 at 1:41 PM
        >              >      > *To: *"Hohensee, Paul" <hohensee at amazon.com>
        >              >      > *Cc: *"serviceability-dev at openjdk.java.net"
        >              >      > <serviceability-dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net"
        >              >      > <hotspot-gc-dev at openjdk.java.net>
        >              >      > *Subject: *Re: RFR (S): 8195115: G1 Old Gen MemoryPool
        >              >      > CollectionUsage.used values don't reflect mixed GC results
        >              >      >
        >              >      > I created  JDK-8196362 to look into whether it makes sense to provide
        >              >      > some categorization to differentiate eden space vs the heap space for
        >              >      > long-lived objects.
        >              >      >
        >              >      > W.r.t. to JDK-8195115, I have to defer to GC team to comment on the
        >              >      > mixed collection update.  If they are okay, I have no objection to
        >              >      > implement a short-term fix and do the proper G1 memory pools as a
        >              >      > separate patch.
        >              >      >
        >              >      > Mandy
        >              >      >
        >              >      > On 1/29/18 1:02 PM, Hohensee, Paul wrote:
        >              >      >
        >              >      >     We don’t use getType, and you guessed correctly: we use the memory
        >              >      >     pool name as an indicator of the specific characteristics of a
        >              >      >     memory pool, in particular eden.
        >              >      >
        >              >      >     What we want is an indication of long term heap occupancy. We
        >              >      >     calculate it using CollectionUsage for non-eden heap memory pools,
        >              >      >     regardless of collector. We don’t use JMX notification, rather we
        >              >      >     periodically poll CollectionUsage for memory pools with names that
        >              >      >     contain “Old”, “Tenured”, or “Survivor”. We get the memory pools
        >              >      >     from the GarbageCollectorMXBeans (we don’t care what the collector
        >              >      >     names are). For the named memory pools, we sum CollectionUsage.used
        >              >      >     and divide by the sum of CollectionUsage.max to get a long term heap
        >              >      >     occupancy percentage. We don’t want to include eden because it’s
        >              >      >     really just an allocation buffer and not part of the storage for
        >              >      >     long-lived objects. I suppose we could use a negative test instead
        >              >      >     by using all memory pools with names that don’t include “Eden”.
        >              >      >
        >              >      >     The bug is that the “G1 Old Gen” memory pool isn’t being updated
        >              >      >     when the “G1 Young Generation” collector runs a mixed collection. As
        >              >      >     far as JMX is concerned, that collector only knows about eden and
        >              >      >     the survivor space. The patch adds the old gen to the memory pools
        >              >      >     it knows about and has mixed collections update the old gen’s
        >              >      >     CollectionUsage.
        >              >      >
        >              >      >     I managed to get a submit repo run to succeed last week and it found
        >              >      >     a problem. I’ve uploaded a new webrev that fixes the failure of the
        >              >      >     jtreg test TestMemoryMXBeansAndPoolsPresence.java due to the young
        >              >      >     gen collector being expected to know only about eden and the
        >              >      >     survivor space.
        >              >      >
        >              >      >     http://cr.openjdk.java.net/~phh/8195115/webrev.hs.01/
        >              >      >     <http://cr.openjdk.java.net/%7Ephh/8195115/webrev.hs.01/>
        >              >      >
        >              >      >     Waiting on the submit repo to come back with a result on it.
        >              >      >
        >              >      >     Thanks,
        >              >      >
        >              >      >     Paul
        >              >      >
        >              >      >     *From: *mandy chung <mandy.chung at oracle.com>
        >              >      >     <mailto:mandy.chung at oracle.com>
        >              >      >     *Organization: *Oracle Corporation
        >              >      >     *Date: *Monday, January 29, 2018 at 10:52 AM
        >              >      >     *To: *"Hohensee, Paul" <hohensee at amazon.com>
        >              >      >     <mailto:hohensee at amazon.com>, Erik Helin <erik.helin at oracle.com>
        >              >      >     <mailto:erik.helin at oracle.com>, David Holmes
        >              >      >     <david.holmes at oracle.com> <mailto:david.holmes at oracle.com>
        >              >      >     *Cc: *"serviceability-dev at openjdk.java.net"
        >              >      >     <mailto:serviceability-dev at openjdk.java.net>
        >              >      >     <serviceability-dev at openjdk.java.net>
        >              >      >     <mailto:serviceability-dev at openjdk.java.net>,
        >              >      >     "hotspot-gc-dev at openjdk.java.net"
        >              >      >     <mailto:hotspot-gc-dev at openjdk.java.net>
        >              >      >     <hotspot-gc-dev at openjdk.java.net>
        >              >      >     <mailto:hotspot-gc-dev at openjdk.java.net>
        >              >      >     *Subject: *Re: RFR (S): 8195115: G1 Old Gen MemoryPool
        >              >      >     CollectionUsage.used values don't reflect mixed GC results
        >              >      >
        >              >      >     On 1/29/18 10:35 AM, mandy chung wrote:
        >              >      >
        >              >      >         Thanks for the reply Paul.   Try to understand a little more on
        >              >      >         the specific from gc-specific memory pool you depend on.
        >              >      >
        >              >      >         On 1/29/18 8:27 AM, Hohensee, Paul wrote:
        >              >      >
        >              >      >             A name change would affect Amazon’s heap monitoring, and
        >              >      >             thus I expect it would affect other users as well.
        >              >      >
        >              >      >             As long as there are gc-specific memory pools, we’re going
        >              >      >             to need to be able to identify them, and right now that’s
        >              >      >             done via name.
        >              >      >
        >              >      >
        >              >      >         MemoryPoolMXBean::getType returns "heap" memory type for
        >              >      >         GC-specific memory pools.  Are you using this method?  Do you
        >              >      >         use the name to build in specific characteristic of a memory
        >              >      >         pool (e.g. eden vs old gen)?
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >      >             All the mxbeans are identified by name, so that’s a general
        >              >      >             design principle. The only way I can think of to get rid of
        >              >      >             name dependency would be to figure out what abstract metrics
        >              >      >             users want to monitor and implement them for all collectors.
        >              >      >             HeapUsage (instantaneous occupancy) is one, CollectionUsage
        >              >      >             (long-lived occupancy) is another, both of these for the
        >              >      >             entire heap, not just particular memory pools.
        >              >      >
        >              >      >
        >              >      >         The sum of HeapUsage and CollectionUsage of all heap memory
        >              >      >         pools was expected to give an incorrect approximation for the
        >              >      >         entire heap usage.  Are you seeing issue/bug with the sum result?
        >              >      >
        >              >      >
        >              >      >     typo: s/an incorrect approximation/an approximation.
        >              >      >
        >              >      >     Mandy
        >              >      >
        >              >      >
        >              >      >
        >              >      >         Mandy
        >              >      >
        >              >      >
        >              >      >
        >              >      >             That said, imo there will always be a demand for the ability
        >              >      >             to get collector and memory pool specific details, so I
        >              >      >             don’t see a way to get around providing named entities.
        >              >      >
        >              >      >             Paul
        >              >      >
        >              >      >             *From: *mandy chung <mandy.chung at oracle.com>
        >              >      >             <mailto:mandy.chung at oracle.com>
        >              >      >             *Organization: *Oracle Corporation
        >              >      >             *Date: *Friday, January 26, 2018 at 2:38 PM
        >              >      >             *To: *"Hohensee, Paul" <hohensee at amazon.com>
        >              >      >             <mailto:hohensee at amazon.com>, Erik Helin
        >              >      >             <erik.helin at oracle.com> <mailto:erik.helin at oracle.com>,
        >              >      >             David Holmes <david.holmes at oracle.com>
        >              >      >             <mailto:david.holmes at oracle.com>
        >              >      >             *Cc: *"serviceability-dev at openjdk.java.net"
        >              >      >             <mailto:serviceability-dev at openjdk.java.net>
        >              >      >             <serviceability-dev at openjdk.java.net>
        >              >      >             <mailto:serviceability-dev at openjdk.java.net>,
        >              >      >             "hotspot-gc-dev at openjdk.java.net"
        >              >      >             <mailto:hotspot-gc-dev at openjdk.java.net>
        >              >      >             <hotspot-gc-dev at openjdk.java.net>
        >              >      >             <mailto:hotspot-gc-dev at openjdk.java.net>
        >              >      >             *Subject: *Re: RFR (S): 8195115: G1 Old Gen MemoryPool
        >              >      >             CollectionUsage.used values don't reflect mixed GC results
        >              >      >
        >              >      >             On 1/25/18 1:04 PM, Hohensee, Paul wrote:
        >              >      >
        >              >      >
        >              >      >              > The JMX API spec doesn’t specify what the memory pool or
        >              >      >             garbage > collector names are, but the current names are
        >              >      >             de-facto part of the > API, so if we change the existing
        >              >      >             ones, imo a CSR should be filed.
        >              >      >
        >              >      >             The names are implementation details but I can see how an application
        >              >      >
        >              >      >             might be impacted if they depend on it.  CSR approval is not strictly
        >              >      >
        >              >      >             necessary while I think filing one to document the change would be
        >              >      >
        >              >      >             good.
        >              >      >
        >              >      >             Does the name change impact any application you know of?  I'm trying to
        >              >      >
        >              >      >             understand if any improvement to API is needed so that applications
        >              >      >
        >              >      >             don't need to depend on the names.
        >              >      >
        >              >      >
        >              >      >             Mandy
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >      >
        >              >
        >              >
        >              
        >          
        >          
        >      
        >      
        > 
        
    
    



More information about the hotspot-gc-dev mailing list