Big hiccups with ZGC

Thu Nov 8 17:22:26 UTC 2018

Oh, a couple other quick things I noticed in the GC logs ...

You should consider making the following suggested system configuration 
change:

[2018-11-08T12:09:55.060+0000][0.006s][17][gc,init] ***** WARNING! 
INCORRECT SYSTEM CONFIGURATION DETECTED! *****
[2018-11-08T12:09:55.060+0000][0.006s][17][gc,init] The system limit on 
number of memory mappings per process might be too low for the given
[2018-11-08T12:09:55.060+0000][0.006s][17][gc,init] max Java heap size 
(51200M). Please adjust /proc/sys/vm/max_map_count to allow for at
[2018-11-08T12:09:55.060+0000][0.006s][17][gc,init] least 92160 mappings 
(current limit is 65530). Continuing execution with the current
[2018-11-08T12:09:55.060+0000][0.006s][17][gc,init] limit could lead to 
a fatal error, due to failure to map memory.

Large pages are disabled as indicated by:
[2018-11-08T12:09:55.059+0000][0.005s][17][gc,init] Large Page Support: 
Disabled

ZGC tends to perform better with huge pages enabled. It is not required 
to run ZGC, but it should help. Enabling huge pages can be done by 
setting Linux transparent huge pages to "madvise" for both transparent 
huge pages "enabled" and "defrag", and then adding 
-XX:+UseTransparentHugePages -XX:+AlwaysPreTouch JVM command line options.

hths,

charlie

On 11/8/18 11:12 AM, charlie hunt wrote:
> Hi Alex,
>
> Did a quick look at the first two GC logs. Haven't had a chance to 
> look at the 3rd.
>
> A couple tips that may help you as you continue your looking at ZGC.
>
> - If you see "Allocation Stall" in the GC log, such as "Allocation 
> Stall (qtp1059634518-72) 15.108ms", this means that ZGC has slowed 
> down the application thread(s) because you are running out of 
> available heap space. In other words, GC lost the race of reclaiming 
> space with the allocation rate.
>
> When you see these "Allocation Stall" messages in the GC log, there 
> are a couple options, (one of these or a combination should resolve 
> what you are seeing):
> a.) Increase the number of concurrent GC threads. This will help ZGC 
> win the race. In your first GC log, there are 8 concurrent GC threads. 
> It probably needs 10 or 12 concurrent GC threads in the absence of 
> making other changes.
> b.) Increase the size of the Java heap to offer ZGC additional head room.
> c.) Make changes to the application to either reduce the amount of 
> live data, or reduce the allocation rate.
>
> If you reduce cache sizes as you mentioned, this should help avoid the 
> "Allocation Stalls".
>
> hths,
>
> charlie
>
> On 11/8/18 9:57 AM, Alex Yakushev wrote:
>> A quick follow up. I think we figured what's going on – there is not 
>> enough free heap to deal with the allocation rate. You see, we have a 
>> cache inside the program the size of which was tuned with G1 enabled. 
>> Apparently, ZGC (and Shenandoah too, got the same problems with it 
>> today) inflates the size of the cache in bytes (because of the 
>> overhead) which leaves less breathing room for ZGC/Shenandoah to 
>> work. Will try to reduce the cache size and come back with the results.