ZGC Allocation stall metrics via MXBean

Thu Oct 24 17:34:26 UTC 2019

Thanks Per, that's helpful to understand.

On exposing allocation stall information via an MXBean, it would be super nice if it was exposed via a bean that implements NotificationEmitter. We're currently using notifications from the GarbageCollectorMXBean to subscribe to GC events and record data on pause duration, cause, etc, as well as what in-flight operations may have been impacted by the pause. If we could use a similar approach to watch for allocation stalls, instead of polling for stalls via the ThreadMXBean, that would be awesome.

On 10/24/19, 00:47, "Per Liden" <per.liden at oracle.com> wrote:

    Hi,

    When allocating a small object (object size <= 256K), if the thread 
    already has a TLAB it will continue to allocate from it without being 
    stalled. If the TLAB is exhausted, the thread will try to allocate a new 
    TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for 
    allocating TLABS), again without being stalled. Only if that CPU-local 
    ZPage is also exhausted will the thread try to allocate a new ZPage, in 
    which case it will be stalled if we're currently out of memory.

    The allocation path is slightly different when allocating medium objects 
    (object size <= 4M). In this cases, the first attempt is to allocate the 
    object into a global/shared medium ZPage. If that page is exhausted, it 
    will try to allocate a new medium ZPage, and is subject to allocation 
    stall if we're out of memory.

    For large objects (object size > 4M), we always allocate a new large 
    ZPage, so we'll have an allocation stall if we're out of memory.

    In summary, if we're out of memory, a thread might still be able to 
    allocate obejcts without being stalled. If circumstances are right.

    Exposing allocation stall information via an MXBean might be useful. We 
    certainly have the information, so it's mostly a question about if and 
    how we want to expose it. Just thinking out loud, one could imagine 
    adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean, 
    or maybe even introduce a c.s.m.ZGarbageCollectorMXBean.

    cheers,
    Per

    On 10/24/19 6:08 AM, Connaughton, Niall wrote:
    > I was going to ask the same question.
    > 
    > In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB?
    > 
    > Thanks,
    > Niall
    > 
    > On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" <zgc-dev-bounces at openjdk.java.net on behalf of m.sundar85 at gmail.com> wrote:
    > 
    >      Hi,
    >          I was trying to get GC metrics via GarbageCollectorMXBean but only see
    >      CollectionCount and CollectionTime.
    >      Even though i can get the Allocation Stall event from gc log i have to do
    >      some special setup to get that collected and reported properly.
    >      Since ZGC allocation stall is important event to identify if the
    >      application is having issue, can we expose it via any other MXBean?
    >      
    >      Thanks
    >      Sundar
    >      
    >