Very long allocation stall

Uberto Barbini uberto.gama at gmail.com
Thu Sep 27 15:13:43 UTC 2018


Thank you very much Per for the detailed explanation.
>From the gc.log I see:
Live 4849M
Allocation rate (avg/max): 2128 / 5490

Since my program is able to run without OOM with 1g heap, I'm a bit
confused about what exactly Live means.

In the meanwhile I'll also look at the peak allocation rate which seems a
bit higher than needed (my guesstimate were 10k runs/s * 100 moves per run
* 1k per move = 1GB per sec.)

Thank you again

Uberto


----
[2018-09-26T19:31:01.512+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-5) 2218.161ms
[2018-09-26T19:31:01.514+0100][info][gc,phases   ] GC(139) Concurrent
Select Relocation Set 2.322ms
[2018-09-26T19:31:01.518+0100][info][gc,phases   ] GC(139) Concurrent
Prepare Relocation Set 4.155ms
[2018-09-26T19:31:01.519+0100][info][gc,phases   ] GC(139) Pause Relocate
Start 0.083ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-9) 2224.524ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-15) 2224.466ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-3) 2224.340ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-11) 2224.326ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-5) 6.433ms
[2018-09-26T19:31:01.530+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-7) 2231.541ms
[2018-09-26T19:31:01.519+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-13) 2224.212ms
[2018-09-26T19:31:01.581+0100][info][gc,phases   ] GC(139) Concurrent
Relocate 62.703ms
[2018-09-26T19:31:01.581+0100][info][gc,load     ] GC(139) Load:
7.51/8.25/7.38
[2018-09-26T19:31:01.581+0100][info][gc,mmu      ] GC(139) MMU: 2ms/0.0%,
5ms/0.0%, 10ms/0.0%, 20ms/0.0%, 50ms/0.0%, 100ms/0.0%
[2018-09-26T19:31:01.581+0100][info][gc,marking  ] GC(139) Mark: 2
stripe(s), 3 proactive flush(es), 1 terminate flush(es), 0 completion(s), 0
continuation(s)
[2018-09-26T19:31:01.581+0100][info][gc,reloc    ] GC(139) Relocation:
Incomplete
[2018-09-26T19:31:01.581+0100][info][gc,nmethod  ] GC(139) NMethods: 385
registered, 180 unregistered
[2018-09-26T19:31:01.581+0100][info][gc,metaspace] GC(139) Metaspace: 8M
used, 8M capacity, 9M committed, 10M reserved
[2018-09-26T19:31:01.581+0100][info][gc,ref      ] GC(139) Soft: 103
encountered, 0 discovered, 0 enqueued
[2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Weak: 148
encountered, 75 discovered, 0 enqueued
[2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Final: 0
encountered, 0 discovered, 0 enqueued
[2018-09-26T19:31:01.582+0100][info][gc,ref      ] GC(139) Phantom: 42
encountered, 36 discovered, 0 enqueued
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)
Mark Start          Mark End        Relocate Start      Relocate End
   High               Low
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)  Capacity:
12288M (100%)      12288M (100%)      12288M (100%)      12288M (100%)
12288M (100%)      12288M (100%)
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)   Reserve:
 42M (0%)           42M (0%)           42M (0%)           42M (0%)
 42M (0%)           36M (0%)
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Free:
 3442M (28%)           0M (0%)            0M (0%)         3550M (29%)
  3798M (31%)           0M (0%)
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Used:
 8804M (72%)       12246M (100%)      12246M (100%)       8696M (71%)
 12252M (100%)       8448M (69%)
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)      Live:
   -              4849M (39%)        4849M (39%)        4849M (39%)
    -                  -
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139) Allocated:
   -              3442M (28%)        3444M (28%)        3830M (31%)
    -                  -
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139)   Garbage:
   -              3954M (32%)        3952M (32%)          54M (0%)
   -                  -
[2018-09-26T19:31:01.582+0100][info][gc,heap     ] GC(139) Reclaimed:
   -                  -                 2M (0%)         3900M (32%)
    -                  -
[2018-09-26T19:31:01.582+0100][info][gc          ] GC(139) Garbage
Collection (Allocation Rate) 8804M(72%)->8696M(71%)
[2018-09-26T19:31:01.681+0100][info][gc,start    ] GC(140) Garbage
Collection (Allocation Rate)
[2018-09-26T19:31:01.682+0100][info][gc,phases   ] GC(140) Pause Mark Start
0.094ms
[2018-09-26T19:31:02.682+0100][info][gc,stats    ] === Garbage Collection
Statistics
=======================================================================================================================
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]
                                    Last 10s              Last 10m
    Last 10h                Total
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]
                                    Avg / Max             Avg / Max
     Avg / Max             Avg / Max
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]   Collector: Garbage
Collection Cycle                   3063.847 / 3099.858   2011.178 /
3671.006   2011.178 / 3671.006   2011.178 / 3671.006    ms
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: Mark
Segment Reset Contention                     1 / 3                 1 / 8
             1 / 8                 1 / 8           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: Mark SeqNum
Reset Contention                      0 / 1                 0 / 2
       0 / 2                 0 / 2           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]  Contention: Relocation
Contention                             0 / 0                 2 / 51
        2 / 51                2 / 51          ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: Allocation
Stall                           1922.303 / 2231.541   1448.690 / 2297.381
 1448.690 / 2297.381   1448.690 / 2297.381    ms
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: Allocation
Stall                                  2 / 8                 1 / 12
        1 / 12                1 / 12          ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: GC Locker
Stall                               0.000 / 0.000         0.000 / 0.000
     0.000 / 0.000         0.000 / 0.000       ms
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]    Critical: GC Locker
Stall                                   0 / 0                 0 / 0
         0 / 0                 0 / 0           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Allocation
Rate                                1428 / 2914           2128 / 5490
     2128 / 5490           2128 / 5490        MB/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap Used
After Mark                          12246 / 12246         10840 / 12246
     10840 / 12246         10840 / 12246       MB
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap Used
After Relocation                     8659 / 9040           6584 / 9040
     6584 / 9040           6584 / 9040        MB
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap Used
Before Mark                          9076 / 9298           7126 / 9478
     7126 / 9478           7126 / 9478        MB
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Heap Used
Before Relocation                   12246 / 12246         10850 / 12246
     10850 / 12246         10850 / 12246       MB
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Out Of
Memory                                     0 / 0                 0 / 0
           0 / 0                 0 / 0           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page Cache
Flush                                  0 / 0                 0 / 32
        0 / 32                0 / 32          MB/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page Cache
Hit L1                               719 / 1457           1053 / 2604
     1053 / 2604           1053 / 2604        ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page Cache
Hit L2                                 0 / 0                 0 / 0
       0 / 0                 0 / 0           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Page Cache
Miss                                   0 / 0                19 / 1940
       19 / 1940             19 / 1940        ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo Object
Allocation Failed                     0 / 0                 0 / 4
       0 / 4                 0 / 4           ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo Object
Allocation Succeeded                  0 / 0                 2 / 51
      2 / 51                2 / 51          ops/s
[2018-09-26T19:31:02.682+0100][info][gc,stats    ]      Memory: Undo Page
Allocation                              0 / 3                 0 / 10
        0 / 10                0 / 10          ops/s


On Thu, 27 Sep 2018 at 09:39, Per Liden <per.liden at oracle.com> wrote:

> Hi,
>
> On 09/26/2018 10:43 PM, Uberto Barbini wrote:
> > Hi Per,
> >
> > thanks for your reply.
> > My comments inline
> >
> >      > Do you have any idea of what can be the possible cause?
> >
> >     If you see allocation stalls it's an indication that ZGC can't keep
> up
> >     with the allocation rate of your application. I.e. the headroom in
> the
> >     heap (max heap size - live-set size) is too small. To avoid this,
> >     increase the max heap size until you don't get allocation stalls and
> >     you
> >     have a healthy/acceptable GC frequency.
> >
> >
> > Although the application is pretty aggressive on the allocation, the
> > live-set is relatively small (<1gb), so I thought 6gb of heap should be
> > enough.
> > I've doubled it to 12gb of heap and performance improved a bit but there
> > are still lots of Allocation Stalls.
> >
> >     Another (secondary) option to play with is -XX:ConcGCThreads to give
> >     more CPU-time to the GC (essentially making it faster so each cycle
> >     will
> >     be shorter). You can use -Xlog:gc+init to see how many threads ZGC
> >     automatically selects now and go from there.
> >
> >
> > Changing the number of threads improved things a lot, but didn't really
> > solved the problem, only moved it further.
> > Using XX:ConcGCThreads=4, in 60 seconds it is able to calculate about
> > 480k positions, same than g1 (but with better latency) and almost the
> > double then before.
> > But increasing the search depth to 120 seconds for move, the Allocation
> > Stalls popped up again and the app stopped again.
> > Increasing to more than 4 didn't change anything since I've only 4 cores
> > in my cpu.
>
> Ok, so it sounds like you've over-provisioned your system. With 4
> concurrent GC threads on a 4 core machine, the GC and your application
> will compete for CPU-time. This is typically bad for latency. On the
> other hand, it seems like you're trying to optimize for throughput
> rather than latency here?
>
> >
> > I'm not familiar enough with ZGC to try an explanation, but in my
> > application only one thread keep most of the long living memory
> > allocations (the tree of positions) whilst the other threads are used to
> > evaluate each new position, which means: 1) allocate many objects 2)
> > calculate the result 3) return results to main thread and deallocate all
> > objects. Is it possible that this behavior confound ZGC?
>
> No, that behavior is fine. From what I can tell, your application just
> has a very high allocation rate. You need a heap headroom that is at
> least "allocation_rate * gc_cycle_time" to avoid allocation stalls. For
> example, let's say it takes 2 second to complete a GC cycle and your
> application is allocating 10GB/s, then you need at least a 20GB headroom.
>
> In you case, you say you have a live-set that is <1GB, so a complete GC
> cycle should be pretty short, like well below 1 second. If your ~11GB of
> headroom isn't enough, then you must be allocating more than ~11GB/s?
>
> You can use -Xlog:gc+alloc=debug to continuously print the allocation
> rate. You can also see it in the statistics counters with -Xlog:gc+stats
> (or just -Xlog:gc*). You can see the GC cycle time, live-set size there
> too.
>
> cheers,
> Per
>


More information about the zgc-dev mailing list