Very long allocation stall

Thu Sep 27 08:38:58 UTC 2018

Hi,

On 09/26/2018 10:43 PM, Uberto Barbini wrote:
> Hi Per,
> 
> thanks for your reply.
> My comments inline
> 
>      > Do you have any idea of what can be the possible cause?
> 
>     If you see allocation stalls it's an indication that ZGC can't keep up
>     with the allocation rate of your application. I.e. the headroom in the
>     heap (max heap size - live-set size) is too small. To avoid this,
>     increase the max heap size until you don't get allocation stalls and
>     you
>     have a healthy/acceptable GC frequency.
> 
> 
> Although the application is pretty aggressive on the allocation, the 
> live-set is relatively small (<1gb), so I thought 6gb of heap should be 
> enough.
> I've doubled it to 12gb of heap and performance improved a bit but there 
> are still lots of Allocation Stalls.
> 
>     Another (secondary) option to play with is -XX:ConcGCThreads to give
>     more CPU-time to the GC (essentially making it faster so each cycle
>     will
>     be shorter). You can use -Xlog:gc+init to see how many threads ZGC
>     automatically selects now and go from there.
> 
> 
> Changing the number of threads improved things a lot, but didn't really 
> solved the problem, only moved it further.
> Using XX:ConcGCThreads=4, in 60 seconds it is able to calculate about 
> 480k positions, same than g1 (but with better latency) and almost the 
> double then before.
> But increasing the search depth to 120 seconds for move, the Allocation 
> Stalls popped up again and the app stopped again.
> Increasing to more than 4 didn't change anything since I've only 4 cores 
> in my cpu.

Ok, so it sounds like you've over-provisioned your system. With 4 
concurrent GC threads on a 4 core machine, the GC and your application 
will compete for CPU-time. This is typically bad for latency. On the 
other hand, it seems like you're trying to optimize for throughput 
rather than latency here?

> 
> I'm not familiar enough with ZGC to try an explanation, but in my 
> application only one thread keep most of the long living memory 
> allocations (the tree of positions) whilst the other threads are used to 
> evaluate each new position, which means: 1) allocate many objects 2) 
> calculate the result 3) return results to main thread and deallocate all 
> objects. Is it possible that this behavior confound ZGC?

No, that behavior is fine. From what I can tell, your application just 
has a very high allocation rate. You need a heap headroom that is at 
least "allocation_rate * gc_cycle_time" to avoid allocation stalls. For 
example, let's say it takes 2 second to complete a GC cycle and your 
application is allocating 10GB/s, then you need at least a 20GB headroom.

In you case, you say you have a live-set that is <1GB, so a complete GC 
cycle should be pretty short, like well below 1 second. If your ~11GB of 
headroom isn't enough, then you must be allocating more than ~11GB/s?

You can use -Xlog:gc+alloc=debug to continuously print the allocation 
rate. You can also see it in the statistics counters with -Xlog:gc+stats 
(or just -Xlog:gc*). You can see the GC cycle time, live-set size there too.

cheers,
Per