ZGC Unable to reclaim memory for long time

Wed Nov 6 10:37:32 UTC 2019

Hi,

On 11/5/19 2:27 AM, Sundara Mohan M wrote:
> HI Per,
> This explains why it didn't work to reclaim memory, also my heap memory 
> was 8G and 6G was strongly reachable (when i took heap dump). Agreed 
> increasing heap memory will help in this case.
> 
> Still trying to understand better on ZGC,
> 1. So shouldn't GC try to be more aggressive and try to put more effort 
> to reclaim without additional settings?
> 2. Is there a reason why it shouldn't give more CPU to GC threads and 
> reclaim garbage (say after X run of GC it could not reclaim memory)? In 
> this case it would be good to reclaim existing garbage instead of doing 
> Allocation Stall and failing with heap out of memory.

The tricky part is knowing/detecting when to be more aggressive, since 
it tends to become an exercise in trying to predict the future. Reacting 
when something bad happens (e.g. allocation stall) tends to be too late.

However, before thinking too much about heuristics, we might just want 
to reconsider the ZFragmentationLimit default value, as it is perhaps a 
bit too generous today. Most apps I've looked at tend to stabilize 
somewhere between 2-10% fragmentation/waste (i.e. way below 25%), so 
lowering the default might not hurt most apps, but help some apps.

cheers,
Per

> 
> 
> Thanks
> Sundar
> 
> On Mon, Nov 4, 2019 at 12:40 PM Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     Hi,
> 
>     When a workload produces a uniformly swiss-cheesy heap, i.e. where all
>     parts of the heap have roughly the same amount of garbage, then the GC
>     will face a situation where there are no free lunches and it will have
>     to work hard (compact a lot) to reclaim memory. Therefore, the GC will
>     tolerate a certain amount of fragmentation/waste, in the hope that more
>     object will die soon, making compaction less expensive (at the expense
>     of using more memory for a while). How many CPU cycles to spend on
>     compaction vs. how much memory you can spare is of course a trade-off.
> 
>     You can use -XX:ZFragmentationLimit to control this. It currently
>     defaults to 25% and your workload seems to stabilize at 21%. If you
>     want
>     more aggressive compaction/reclamation, then set the
>     -XX:ZFragmentationLimit to something below 21. This may or may not be a
>     good trade-off in your case. The alternative is to give the GC a larger
>     heap to work with.
> 
>     cheers,
>     Per
> 
>     On 11/4/19 7:56 PM, Sundara Mohan M wrote:
>      > Hi,
>      >     I ran into this issue where ZGC is unable to reclaim memory
>     for few
>      > hours/days. It just keep printing "Exception in thread "RMI TCP
>      > Connection(idle)" java.lang.OutOfMemoryError: Java heap space"  and
>      > Allocation Stall happening on that thread.
>      >
>      >
>      > Here is the metrics which shows for some reason even though there is
>      > Garbage but it is unable to Reclaim
>      >
>      > ....
>      > [2019-11-04T*08:39:53.986+0000*][1765465.981s][info][gc,heap     ]
>      > GC(112126)      Live:         -              6366M (78%)       
>     6366M (78%)
>      >         6366M (78%)
>      >      -                  -
>      > *[2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
>      > GC(112126)   Garbage:         -              1735M (21%)       
>     1735M (21%)
>      >         1731M (21%)*
>      >      -                  -
>      > [2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
>     GC(112126)
>      > Reclaimed:         -                  -                 0M (0%)
>      >   4M (0%)
>      > ...
>      >
>      > [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>     GC(135520)
>      >       Live:         -              6367M (78%)        6367M (78%)
>      >   6367M (78%)
>      >      -                  -
>      > *[2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>      > GC(135520)   Garbage:         -              1730M (21%)       
>     1730M (21%)
>      >         1724M (21%)*
>      >      -                  -
>      > [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>     GC(135520)
>      > Reclaimed:         -                  -                 0M (0%)
>      >   6M (0%)
>      >
>      > Here it was in this state for ~8hours and it is still happening.
>     It says
>      > has a Garbage of 21G but it is not able to Reclaim it everytime
>     it reclaims
>      > only 4-6M.
>      >
>      > Any idea what might be the issue here.
>      >
>      >
>      > TIA
>      > Sundar
>      >
>