ZGC Unable to reclaim memory for long time

Tue Nov 5 15:48:43 UTC 2019

Reading this and similar threads I am struck by the fact that ZGC users are experiencing things that users of Azul’s Zing JVM also go through. I remember the amazement at seeing a JVM run without substantive GC pauses and thinking that it was a free lunch. But the price was two parts - ensuring adequate heap, and rewiring brains that are accustomed to seeing cpu and memory as independent resources. The second turns out to be much harder.

From experience, I think a lot of pain can be avoided by clearly communicating that an adequate heap is a prerequisite for a healthy JVM. Most java developers have absorbed the notion that large heaps are bad/risky and unlearning takes time.

Sent from my iPhone

> On Nov 4, 2019, at 8:28 PM, Sundara Mohan M <m.sundar85 at gmail.com> wrote:
> 
> HI Per,
> This explains why it didn't work to reclaim memory, also my heap memory was
> 8G and 6G was strongly reachable (when i took heap dump). Agreed increasing
> heap memory will help in this case.
> 
> Still trying to understand better on ZGC,
> 1. So shouldn't GC try to be more aggressive and try to put more effort to
> reclaim without additional settings?
> 2. Is there a reason why it shouldn't give more CPU to GC threads and
> reclaim garbage (say after X run of GC it could not reclaim memory)? In
> this case it would be good to reclaim existing garbage instead of doing
> Allocation Stall and failing with heap out of memory.
> 
> 
> Thanks
> Sundar
> 
>> On Mon, Nov 4, 2019 at 12:40 PM Per Liden <per.liden at oracle.com> wrote:
>> 
>> Hi,
>> 
>> When a workload produces a uniformly swiss-cheesy heap, i.e. where all
>> parts of the heap have roughly the same amount of garbage, then the GC
>> will face a situation where there are no free lunches and it will have
>> to work hard (compact a lot) to reclaim memory. Therefore, the GC will
>> tolerate a certain amount of fragmentation/waste, in the hope that more
>> object will die soon, making compaction less expensive (at the expense
>> of using more memory for a while). How many CPU cycles to spend on
>> compaction vs. how much memory you can spare is of course a trade-off.
>> 
>> You can use -XX:ZFragmentationLimit to control this. It currently
>> defaults to 25% and your workload seems to stabilize at 21%. If you want
>> more aggressive compaction/reclamation, then set the
>> -XX:ZFragmentationLimit to something below 21. This may or may not be a
>> good trade-off in your case. The alternative is to give the GC a larger
>> heap to work with.
>> 
>> cheers,
>> Per
>> 
>>> On 11/4/19 7:56 PM, Sundara Mohan M wrote:
>>> Hi,
>>>    I ran into this issue where ZGC is unable to reclaim memory for few
>>> hours/days. It just keep printing "Exception in thread "RMI TCP
>>> Connection(idle)" java.lang.OutOfMemoryError: Java heap space"  and
>>> Allocation Stall happening on that thread.
>>> 
>>> 
>>> Here is the metrics which shows for some reason even though there is
>>> Garbage but it is unable to Reclaim
>>> 
>>> ....
>>> [2019-11-04T*08:39:53.986+0000*][1765465.981s][info][gc,heap     ]
>>> GC(112126)      Live:         -              6366M (78%)        6366M
>> (78%)
>>>        6366M (78%)
>>>     -                  -
>>> *[2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
>>> GC(112126)   Garbage:         -              1735M (21%)        1735M
>> (21%)
>>>        1731M (21%)*
>>>     -                  -
>>> [2019-11-04T08:39:53.986+0000][1765465.981s][info][gc,heap     ]
>> GC(112126)
>>> Reclaimed:         -                  -                 0M (0%)
>>>  4M (0%)
>>> ...
>>> 
>>> [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>> GC(135520)
>>>      Live:         -              6367M (78%)        6367M (78%)
>>>  6367M (78%)
>>>     -                  -
>>> *[2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>>> GC(135520)   Garbage:         -              1730M (21%)        1730M
>> (21%)
>>>        1724M (21%)*
>>>     -                  -
>>> [2019-11-04T16:48:53.742+0000][1794805.738s][info][gc,heap     ]
>> GC(135520)
>>> Reclaimed:         -                  -                 0M (0%)
>>>  6M (0%)
>>> 
>>> Here it was in this state for ~8hours and it is still happening. It says
>>> has a Garbage of 21G but it is not able to Reclaim it everytime it
>> reclaims
>>> only 4-6M.
>>> 
>>> Any idea what might be the issue here.
>>> 
>>> 
>>> TIA
>>> Sundar
>>> 
>>