Big hiccups with ZGC
per.liden at oracle.com
Fri Nov 9 09:23:44 UTC 2018
On 11/8/18 6:12 PM, charlie hunt wrote:
> Hi Alex,
> Did a quick look at the first two GC logs. Haven't had a chance to look
> at the 3rd.
> A couple tips that may help you as you continue your looking at ZGC.
> - If you see "Allocation Stall" in the GC log, such as "Allocation Stall
> (qtp1059634518-72) 15.108ms", this means that ZGC has slowed down the
> application thread(s) because you are running out of available heap
> space. In other words, GC lost the race of reclaiming space with the
> allocation rate.
> When you see these "Allocation Stall" messages in the GC log, there are
> a couple options, (one of these or a combination should resolve what you
> are seeing):
> a.) Increase the number of concurrent GC threads. This will help ZGC win
> the race. In your first GC log, there are 8 concurrent GC threads. It
> probably needs 10 or 12 concurrent GC threads in the absence of making
> other changes.
> b.) Increase the size of the Java heap to offer ZGC additional head room.
> c.) Make changes to the application to either reduce the amount of live
> data, or reduce the allocation rate.
> If you reduce cache sizes as you mentioned, this should help avoid the
> "Allocation Stalls".
I think Charlie summarized it very well and I don't have much to add,
other than I noticed that the live-set seem to grow and grow throughout
the run (see the "Live:" column in the heap stats). Maybe this is the
"cache" you mentioned that is growing?
The only other thing that sticks out from the logs is this:
[2018-11-07T16:28:14.753+0000][0.007s][gc,init] CPUs: 36 total, 1
I.e. HotSpot thinks it only has a single core to play with (at list when
the VM is starting up). Is this workload running in a container or in
some other constrained environment (e.g. numactl)?
> On 11/8/18 9:57 AM, Alex Yakushev wrote:
>> A quick follow up. I think we figured what's going on – there is not
>> enough free heap to deal with the allocation rate. You see, we have a
>> cache inside the program the size of which was tuned with G1 enabled.
>> Apparently, ZGC (and Shenandoah too, got the same problems with it
>> today) inflates the size of the cache in bytes (because of the
>> overhead) which leaves less breathing room for ZGC/Shenandoah to work.
>> Will try to reduce the cache size and come back with the results.
More information about the zgc-dev