Very long allocation stall
per.liden at oracle.com
Thu Sep 27 08:38:58 UTC 2018
On 09/26/2018 10:43 PM, Uberto Barbini wrote:
> Hi Per,
> thanks for your reply.
> My comments inline
> > Do you have any idea of what can be the possible cause?
> If you see allocation stalls it's an indication that ZGC can't keep up
> with the allocation rate of your application. I.e. the headroom in the
> heap (max heap size - live-set size) is too small. To avoid this,
> increase the max heap size until you don't get allocation stalls and
> have a healthy/acceptable GC frequency.
> Although the application is pretty aggressive on the allocation, the
> live-set is relatively small (<1gb), so I thought 6gb of heap should be
> I've doubled it to 12gb of heap and performance improved a bit but there
> are still lots of Allocation Stalls.
> Another (secondary) option to play with is -XX:ConcGCThreads to give
> more CPU-time to the GC (essentially making it faster so each cycle
> be shorter). You can use -Xlog:gc+init to see how many threads ZGC
> automatically selects now and go from there.
> Changing the number of threads improved things a lot, but didn't really
> solved the problem, only moved it further.
> Using XX:ConcGCThreads=4, in 60 seconds it is able to calculate about
> 480k positions, same than g1 (but with better latency) and almost the
> double then before.
> But increasing the search depth to 120 seconds for move, the Allocation
> Stalls popped up again and the app stopped again.
> Increasing to more than 4 didn't change anything since I've only 4 cores
> in my cpu.
Ok, so it sounds like you've over-provisioned your system. With 4
concurrent GC threads on a 4 core machine, the GC and your application
will compete for CPU-time. This is typically bad for latency. On the
other hand, it seems like you're trying to optimize for throughput
rather than latency here?
> I'm not familiar enough with ZGC to try an explanation, but in my
> application only one thread keep most of the long living memory
> allocations (the tree of positions) whilst the other threads are used to
> evaluate each new position, which means: 1) allocate many objects 2)
> calculate the result 3) return results to main thread and deallocate all
> objects. Is it possible that this behavior confound ZGC?
No, that behavior is fine. From what I can tell, your application just
has a very high allocation rate. You need a heap headroom that is at
least "allocation_rate * gc_cycle_time" to avoid allocation stalls. For
example, let's say it takes 2 second to complete a GC cycle and your
application is allocating 10GB/s, then you need at least a 20GB headroom.
In you case, you say you have a live-set that is <1GB, so a complete GC
cycle should be pretty short, like well below 1 second. If your ~11GB of
headroom isn't enough, then you must be allocating more than ~11GB/s?
You can use -Xlog:gc+alloc=debug to continuously print the allocation
rate. You can also see it in the statistics counters with -Xlog:gc+stats
(or just -Xlog:gc*). You can see the GC cycle time, live-set size there too.
More information about the zgc-dev