Very long allocation stall

Fri Sep 28 08:59:17 UTC 2018

On 09/27/2018 05:13 PM, Uberto Barbini wrote:
> Thank you very much Per for the detailed explanation.
>  From the gc.log I see:
> Live 4849M
> Allocation rate (avg/max): 2128 / 5490
> 
> Since my program is able to run without OOM with 1g heap, I'm a bit 
> confused about what exactly Live means.

One thing to keep in mind is that the other collectors will have 
UseCompressedOops enabled by default on heaps up to 32G. ZGC doesn't 
support compressed oops (pointers will always be 64 bit instead of 32 
bit), so if you have a pointer heavy live-set it will be larger with 
ZGC. But that it still doesn't explain the big difference you're seeing, 
only a (relatively small) part of it. You can run the other collectors 
with -XX:-UseCompressedOops to get a sense of how much this effects the 
live-set size.

"Live" here means the sum of the sizes of all objects that were visited 
during marking.

"Allocated" means the sum of the sizes of all objects that have been 
allocated since the GC cycle started.

cheers,
Per

> 
> In the meanwhile I'll also look at the peak allocation rate which seems 
> a bit higher than needed (my guesstimate were 10k runs/s * 100 moves per 
> run * 1k per move = 1GB per sec.)
> 
> Thank you again
> 
> Uberto
> 
> 
> ----
> [2018-09-26T19:31:01.512+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-5) 2218.161ms
> [2018-09-26T19:31:01.514+0100][info][gc,phases   ] GC(139) Concurrent 
> Select Relocation Set 2.322ms
> [2018-09-26T19:31:01.518+0100][info][gc,phases   ] GC(139) Concurrent 
> Prepare Relocation Set 4.155ms
> [2018-09-26T19:31:01.519+0100][info][gc,phases   ] GC(139) Pause 
> Relocate Start 0.083ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-9) 2224.524ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-15) 2224.466ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-3) 2224.340ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-11) 2224.326ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-5) 6.433ms
> [2018-09-26T19:31:01.530+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-7) 2231.541ms
> [2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall 
> (ForkJoinPool.commonPool-worker-13) 2224.212ms
> [2018-09-26T19:31:01.581+0100][info][gc,phases   ] GC(139) Concurrent 
> Relocate 62.703ms
> [2018-09-26T19:31:01.581+0100][info][gc,load     ] GC(139) Load: 
> 7.51/8.25/7.38
> [2018-09-26T19:31:01.581+0100][info][gc,mmu      ] GC(139) MMU: 
> 2ms/0.0%, 5ms/0.0%, 10ms/0.0%, 20ms/0.0%, 50ms/0.0%, 100ms/0.0%
> [2018-09-26T19:31:01.581+0100][info][gc,marking  ] GC(139) Mark: 2 
> stripe(s), 3 proactive flush(es), 1 terminate flush(es), 0 
> completion(s), 0 continuation(s)
> [2018-09-26T19:31:01.581+0100][info][gc,reloc    ] GC(139) Relocation: 
> Incomplete
> [2018-09-26T19:31:01.581+0100][info][gc,nmethod  ] GC(139) NMethods: 385 
> registered, 180 unregistered
> [2018-09-26T19:31:01.581+0100][info][gc,metaspace] GC(139) Metaspace: 8M 
> used, 8M capacity, 9M committed, 10M reserved
> [2018-09-26T19:31:01.581+0100][info][gc,ref      ] GC(139) Soft: 103 
> encountered, 0 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Weak: 148 
> encountered, 75 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Final: 0 
> encountered, 0 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Phantom: 42 
> encountered, 36 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)              
>    Mark Start          Mark End        Relocate Start      Relocate End  
>           High               Low
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)  Capacity:    
> 12288M (100%)      12288M (100%)      12288M (100%)      12288M (100%)  
>      12288M (100%)      12288M (100%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)   Reserve:    
>     42M (0%)           42M (0%)           42M (0%)           42M (0%)    
>         42M (0%)           36M (0%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Free:    
>   3442M (28%)           0M (0%)            0M (0%)         3550M (29%)  
>        3798M (31%)           0M (0%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Used:    
>   8804M (72%)       12246M (100%)      12246M (100%)       8696M (71%)  
>       12252M (100%)       8448M (69%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Live:    
>       -              4849M (39%)        4849M (39%)        4849M (39%)  
>            -                  -
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139) Allocated:    
>       -              3442M (28%)        3444M (28%)        3830M (31%)  
>            -                  -
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)   Garbage:    
>       -              3954M (32%)        3952M (32%)          54M (0%)    
>           -                  -
> [2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139) Reclaimed:    
>       -                  -                 2M (0%)         3900M (32%)  
>            -                  -
> [2018-09-26T19:31:01.582+0100][info][gc          ] GC(139) Garbage 
> Collection (Allocation Rate) 8804M(72%)->8696M(71%)
> [2018-09-26T19:31:01.681+0100][info][gc,start    ] GC(140) Garbage 
> Collection (Allocation Rate)
> [2018-09-26T19:31:01.682+0100][info][gc,phases   ] GC(140) Pause Mark 
> Start 0.094ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ] === Garbage 
> Collection Statistics 
> =======================================================================================================================
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]                      
>                                          Last 10s              Last 10m  
>              Last 10h                Total
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]                      
>                                          Avg / Max             Avg / 
> Max             Avg / Max             Avg / Max
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]   Collector: Garbage 
> Collection Cycle                   3063.847 / 3099.858   2011.178 / 
> 3671.006   2011.178 / 3671.006   2011.178 / 3671.006    ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: Mark 
> Segment Reset Contention                     1 / 3                 1 / 
> 8                 1 / 8                 1 / 8           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: Mark 
> SeqNum Reset Contention                      0 / 1                 0 / 
> 2                 0 / 2                 0 / 2           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: 
> Relocation Contention                             0 / 0                
>   2 / 51                2 / 51                2 / 51          ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: 
> Allocation Stall                           1922.303 / 2231.541  
>   1448.690 / 2297.381   1448.690 / 2297.381   1448.690 / 2297.381    ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: 
> Allocation Stall                                  2 / 8                
>   1 / 12                1 / 12                1 / 12          ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: GC 
> Locker Stall                               0.000 / 0.000         0.000 / 
> 0.000         0.000 / 0.000         0.000 / 0.000       ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: GC 
> Locker Stall                                   0 / 0                 0 / 
> 0                 0 / 0                 0 / 0           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: 
> Allocation Rate                                1428 / 2914          
>   2128 / 5490           2128 / 5490           2128 / 5490        MB/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap 
> Used After Mark                          12246 / 12246         10840 / 
> 12246         10840 / 12246         10840 / 12246       MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap 
> Used After Relocation                     8659 / 9040           6584 / 
> 9040           6584 / 9040           6584 / 9040        MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap 
> Used Before Mark                          9076 / 9298           7126 / 
> 9478           7126 / 9478           7126 / 9478        MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap 
> Used Before Relocation                   12246 / 12246         10850 / 
> 12246         10850 / 12246         10850 / 12246       MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Out Of 
> Memory                                     0 / 0                 0 / 0  
>                 0 / 0                 0 / 0           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page 
> Cache Flush                                  0 / 0                 0 / 
> 32                0 / 32                0 / 32          MB/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page 
> Cache Hit L1                               719 / 1457           1053 / 
> 2604           1053 / 2604           1053 / 2604        ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page 
> Cache Hit L2                                 0 / 0                 0 / 
> 0                 0 / 0                 0 / 0           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page 
> Cache Miss                                   0 / 0                19 / 
> 1940             19 / 1940             19 / 1940        ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo 
> Object Allocation Failed                     0 / 0                 0 / 
> 4                 0 / 4                 0 / 4           ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo 
> Object Allocation Succeeded                  0 / 0                 2 / 
> 51                2 / 51                2 / 51          ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo 
> Page Allocation                              0 / 3                 0 / 
> 10                0 / 10                0 / 10          ops/s
> 
> 
> On Thu, 27 Sep 2018 at 09:39, Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     Hi,
> 
>     On 09/26/2018 10:43 PM, Uberto Barbini wrote:
>      > Hi Per,
>      >
>      > thanks for your reply.
>      > My comments inline
>      >
>      >      > Do you have any idea of what can be the possible cause?
>      >
>      >     If you see allocation stalls it's an indication that ZGC
>     can't keep up
>      >     with the allocation rate of your application. I.e. the
>     headroom in the
>      >     heap (max heap size - live-set size) is too small. To avoid this,
>      >     increase the max heap size until you don't get allocation
>     stalls and
>      >     you
>      >     have a healthy/acceptable GC frequency.
>      >
>      >
>      > Although the application is pretty aggressive on the allocation, the
>      > live-set is relatively small (<1gb), so I thought 6gb of heap
>     should be
>      > enough.
>      > I've doubled it to 12gb of heap and performance improved a bit
>     but there
>      > are still lots of Allocation Stalls.
>      >
>      >     Another (secondary) option to play with is -XX:ConcGCThreads
>     to give
>      >     more CPU-time to the GC (essentially making it faster so each
>     cycle
>      >     will
>      >     be shorter). You can use -Xlog:gc+init to see how many
>     threads ZGC
>      >     automatically selects now and go from there.
>      >
>      >
>      > Changing the number of threads improved things a lot, but didn't
>     really
>      > solved the problem, only moved it further.
>      > Using XX:ConcGCThreads=4, in 60 seconds it is able to calculate
>     about
>      > 480k positions, same than g1 (but with better latency) and almost
>     the
>      > double then before.
>      > But increasing the search depth to 120 seconds for move, the
>     Allocation
>      > Stalls popped up again and the app stopped again.
>      > Increasing to more than 4 didn't change anything since I've only
>     4 cores
>      > in my cpu.
> 
>     Ok, so it sounds like you've over-provisioned your system. With 4
>     concurrent GC threads on a 4 core machine, the GC and your application
>     will compete for CPU-time. This is typically bad for latency. On the
>     other hand, it seems like you're trying to optimize for throughput
>     rather than latency here?
> 
>      >
>      > I'm not familiar enough with ZGC to try an explanation, but in my
>      > application only one thread keep most of the long living memory
>      > allocations (the tree of positions) whilst the other threads are
>     used to
>      > evaluate each new position, which means: 1) allocate many objects 2)
>      > calculate the result 3) return results to main thread and
>     deallocate all
>      > objects. Is it possible that this behavior confound ZGC?
> 
>     No, that behavior is fine. From what I can tell, your application just
>     has a very high allocation rate. You need a heap headroom that is at
>     least "allocation_rate * gc_cycle_time" to avoid allocation stalls. For
>     example, let's say it takes 2 second to complete a GC cycle and your
>     application is allocating 10GB/s, then you need at least a 20GB
>     headroom.
> 
>     In you case, you say you have a live-set that is <1GB, so a complete GC
>     cycle should be pretty short, like well below 1 second. If your
>     ~11GB of
>     headroom isn't enough, then you must be allocating more than ~11GB/s?
> 
>     You can use -Xlog:gc+alloc=debug to continuously print the allocation
>     rate. You can also see it in the statistics counters with
>     -Xlog:gc+stats
>     (or just -Xlog:gc*). You can see the GC cycle time, live-set size
>     there too.
> 
>     cheers,
>     Per
>