ZGC Allocation stall metrics via MXBean

Fri Oct 25 12:14:20 UTC 2019

On 10/24/19 7:34 PM, Connaughton, Niall wrote:
> Thanks Per, that's helpful to understand.
> 
> On exposing allocation stall information via an MXBean, it would be super nice if it was exposed via a bean that implements NotificationEmitter. We're currently using notifications from the GarbageCollectorMXBean to subscribe to GC events and record data on pause duration, cause, etc, as well as what in-flight operations may have been impacted by the pause. If we could use a similar approach to watch for allocation stalls, instead of polling for stalls via the ThreadMXBean, that would be awesome.

Good to know, thanks.

/Per

> 
> On 10/24/19, 00:47, "Per Liden" <per.liden at oracle.com> wrote:
> 
>      Hi,
>      
>      When allocating a small object (object size <= 256K), if the thread
>      already has a TLAB it will continue to allocate from it without being
>      stalled. If the TLAB is exhausted, the thread will try to allocate a new
>      TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for
>      allocating TLABS), again without being stalled. Only if that CPU-local
>      ZPage is also exhausted will the thread try to allocate a new ZPage, in
>      which case it will be stalled if we're currently out of memory.
>      
>      The allocation path is slightly different when allocating medium objects
>      (object size <= 4M). In this cases, the first attempt is to allocate the
>      object into a global/shared medium ZPage. If that page is exhausted, it
>      will try to allocate a new medium ZPage, and is subject to allocation
>      stall if we're out of memory.
>      
>      For large objects (object size > 4M), we always allocate a new large
>      ZPage, so we'll have an allocation stall if we're out of memory.
>      
>      In summary, if we're out of memory, a thread might still be able to
>      allocate obejcts without being stalled. If circumstances are right.
>      
>      Exposing allocation stall information via an MXBean might be useful. We
>      certainly have the information, so it's mostly a question about if and
>      how we want to expose it. Just thinking out loud, one could imagine
>      adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean,
>      or maybe even introduce a c.s.m.ZGarbageCollectorMXBean.
>      
>      cheers,
>      Per
>      
>      On 10/24/19 6:08 AM, Connaughton, Niall wrote:
>      > I was going to ask the same question.
>      >
>      > In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB?
>      >
>      > Thanks,
>      > Niall
>      >
>      > On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" <zgc-dev-bounces at openjdk.java.net on behalf of m.sundar85 at gmail.com> wrote:
>      >
>      >      Hi,
>      >          I was trying to get GC metrics via GarbageCollectorMXBean but only see
>      >      CollectionCount and CollectionTime.
>      >      Even though i can get the Allocation Stall event from gc log i have to do
>      >      some special setup to get that collected and reported properly.
>      >      Since ZGC allocation stall is important event to identify if the
>      >      application is having issue, can we expose it via any other MXBean?
>      >
>      >      Thanks
>      >      Sundar
>      >
>      >
>      
>