Big hiccups with ZGC

Thu Nov 8 17:12:43 UTC 2018

Hi Alex,

Did a quick look at the first two GC logs. Haven't had a chance to look 
at the 3rd.

A couple tips that may help you as you continue your looking at ZGC.

- If you see "Allocation Stall" in the GC log, such as "Allocation Stall 
(qtp1059634518-72) 15.108ms", this means that ZGC has slowed down the 
application thread(s) because you are running out of available heap 
space. In other words, GC lost the race of reclaiming space with the 
allocation rate.

When you see these "Allocation Stall" messages in the GC log, there are 
a couple options, (one of these or a combination should resolve what you 
are seeing):
a.) Increase the number of concurrent GC threads. This will help ZGC win 
the race. In your first GC log, there are 8 concurrent GC threads. It 
probably needs 10 or 12 concurrent GC threads in the absence of making 
other changes.
b.) Increase the size of the Java heap to offer ZGC additional head room.
c.) Make changes to the application to either reduce the amount of live 
data, or reduce the allocation rate.

If you reduce cache sizes as you mentioned, this should help avoid the 
"Allocation Stalls".

hths,

charlie

On 11/8/18 9:57 AM, Alex Yakushev wrote:
> A quick follow up. I think we figured what's going on – there is not 
> enough free heap to deal with the allocation rate. You see, we have a 
> cache inside the program the size of which was tuned with G1 enabled. 
> Apparently, ZGC (and Shenandoah too, got the same problems with it 
> today) inflates the size of the cache in bytes (because of the 
> overhead) which leaves less breathing room for ZGC/Shenandoah to work. 
> Will try to reduce the cache size and come back with the results.