ZGC Unable to reclaim memory for long time

Wed Nov 6 20:17:54 UTC 2019

Hi Per
   As per [1] https://wiki.openjdk.java.net/display/zgc/Main it says it can
handle  *few hundred megabytes* to multi terabytes*.*

So my understanding was if my application is running with 8G before, with
ZGC and same heap also it should run without issues. So far that is not
the case i have to increase the heap size always to make sure it gets the
same latency/RPS.

For me this doesn't seem to be true always in my case(heap ranging from 8 -
48 G i need to change to higher value to make sure i am getting same RPS
and latency). Again this is my observation and might vary for different
workload.

Thanks
Sundar

On Wed, Nov 6, 2019 at 2:46 AM Per Liden <per.liden at oracle.com> wrote:

> On 11/5/19 4:48 PM, Peter Booth wrote:
> > Reading this and similar threads I am struck by the fact that ZGC users
> are experiencing things that users of Azul’s Zing JVM also go through. I
> remember the amazement at seeing a JVM run without substantive GC pauses
> and thinking that it was a free lunch. But the price was two parts -
> ensuring adequate heap, and rewiring brains that are accustomed to seeing
> cpu and memory as independent resources. The second turns out to be much
> harder.
> >
> >  From experience, I think a lot of pain can be avoided by clearly
> communicating that an adequate heap is a prerequisite for a healthy JVM.
> Most java developers have absorbed the notion that large heaps are
> bad/risky and unlearning takes time.
>
> The documentation on the ZGC wiki [1] tries to be clear about this, but
> I'm sure it could be improved.
>
> [1] https://wiki.openjdk.java.net/display/zgc/Main
>
> cheers,
> Per
>
> >
> > Sent from my iPhone
> >
> >> On Nov 4, 2019, at 8:28 PM, Sundara Mohan M <m.sundar85 at gmail.com>
> wrote:
> >>
> >> HI Per,
> >> This explains why it didn't work to reclaim memory, also my heap memory
> was
> >> 8G and 6G was strongly reachable (when i took heap dump). Agreed
> increasing
> >> heap memory will help in this case.
> >>
> >> Still trying to understand better on ZGC,
> >> 1. So shouldn't GC try to be more aggressive and try to put more effort
> to
> >> reclaim without additional settings?
> >> 2. Is there a reason why it shouldn't give more CPU to GC threads and
> >> reclaim garbage (say after X run of GC it could not reclaim memory)? In
> >> this case it would be good to reclaim existing garbage instead of doing
> >> Allocation Stall and failing with heap out of memory.
> >>
> >>
> >> Thanks
> >> Sundar
> >>
> >>> On Mon, Nov 4, 2019 at 12:40 PM Per Liden <per.liden at oracle.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> When a workload produces a uniformly swiss-cheesy heap, i.e. where all
> >>> parts of the heap have roughly the same amount of garbage, then the GC
> >>> will face a situation where there are no free lunches and it will have
> >>> to work hard (compact a lot) to reclaim memory. Therefore, the GC will
> >>> tolerate a certain amount of fragmentation/waste, in the hope that more
> >>> object will die soon, making compaction less expensive (at the expense
> >>> of using more memory for a while). How many CPU cycles to spend on
> >>> compaction vs. how much memory you can spare is of course a trade-off.
> >>>
> >>> You can use -XX:ZFragmentationLimit to control this. It currently
> >>> defaults to 25% and your workload seems to stabilize at 21%. If you
> want
> >>> more aggressive compaction/reclamation, then set the
> >>> -XX:ZFragmentationLimit to something below 21. This may or may not be a
> >>> good trade-off in your case. The alternative is to give the GC a larger
> >>> heap to work with.
> >>>
> >>> cheers,
> >>> Per
> >>>
> >>>> On 11/4/19 7:56 PM, Sundara Mohan M wrote:
> >>>> Hi,
> >>>>     I ran into this issue where ZGC is unable to reclaim memory for
> few
> >>>> hours/days. It just keep printing "Exception in thread "RMI TCP
> >>>> Connection(idle)" java.lang.OutOfMemoryError: Java heap space"  and
> >>>> Allocation Stall happening on that thread.
> >>>>
> >>>>
> >>>> Here is the metrics which shows for some reason even though there is
> >>>> Garbage but it is unable to Reclaim
> >>>>
> >>>> ....
> >>>> [2019-11-04T*08:39:53.986+0000*][1765465.981s][info][gc,heap     ]
> >>>> GC(112126)      Live:         -              6366M (78%)        6366M
> >>> (78%)
> >>>>         6366M (78%)
> >>>>      -                  -
> >>>> *[2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
> >>>> GC(112126)   Garbage:         -              1735M (21%)        1735M
> >>> (21%)
> >>>>         1731M (21%)*
> >>>>      -                  -
> >>>> [2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
> >>> GC(112126)
> >>>> Reclaimed:         -                  -                 0M (0%)
> >>>>   4M (0%)
> >>>> ...
> >>>>
> >>>> [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
> >>> GC(135520)
> >>>>       Live:         -              6367M (78%)        6367M (78%)
> >>>>   6367M (78%)
> >>>>      -                  -
> >>>> *[2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
> >>>> GC(135520)   Garbage:         -              1730M (21%)        1730M
> >>> (21%)
> >>>>         1724M (21%)*
> >>>>      -                  -
> >>>> [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
> >>> GC(135520)
> >>>> Reclaimed:         -                  -                 0M (0%)
> >>>>   6M (0%)
> >>>>
> >>>> Here it was in this state for ~8hours and it is still happening. It
> says
> >>>> has a Garbage of 21G but it is not able to Reclaim it everytime it
> >>> reclaims
> >>>> only 4-6M.
> >>>>
> >>>> Any idea what might be the issue here.
> >>>>
> >>>>
> >>>> TIA
> >>>> Sundar
> >>>>
> >>>
> >
>