Very long allocation stall
Per Liden
per.liden at oracle.com
Fri Sep 28 08:59:17 UTC 2018
On 09/27/2018 05:13 PM, Uberto Barbini wrote:
> Thank you very much Per for the detailed explanation.
> From the gc.log I see:
> Live 4849M
> Allocation rate (avg/max): 2128 / 5490
>
> Since my program is able to run without OOM with 1g heap, I'm a bit
> confused about what exactly Live means.
One thing to keep in mind is that the other collectors will have
UseCompressedOops enabled by default on heaps up to 32G. ZGC doesn't
support compressed oops (pointers will always be 64 bit instead of 32
bit), so if you have a pointer heavy live-set it will be larger with
ZGC. But that it still doesn't explain the big difference you're seeing,
only a (relatively small) part of it. You can run the other collectors
with -XX:-UseCompressedOops to get a sense of how much this effects the
live-set size.
"Live" here means the sum of the sizes of all objects that were visited
during marking.
"Allocated" means the sum of the sizes of all objects that have been
allocated since the GC cycle started.
cheers,
Per
>
> In the meanwhile I'll also look at the peak allocation rate which seems
> a bit higher than needed (my guesstimate were 10k runs/s * 100 moves per
> run * 1k per move = 1GB per sec.)
>
> Thank you again
>
> Uberto
>
>
> ----
> [2018-09-26T19:31:01.512+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-5) 2218.161ms
> [2018-09-26T19:31:01.514+0100][info][gc,phases ] GC(139) Concurrent
> Select Relocation Set 2.322ms
> [2018-09-26T19:31:01.518+0100][info][gc,phases ] GC(139) Concurrent
> Prepare Relocation Set 4.155ms
> [2018-09-26T19:31:01.519+0100][info][gc,phases ] GC(139) Pause
> Relocate Start 0.083ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-9) 2224.524ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-15) 2224.466ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-3) 2224.340ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-11) 2224.326ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-5) 6.433ms
> [2018-09-26T19:31:01.530+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-7) 2231.541ms
> [2018-09-26T19:31:01.519+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-13) 2224.212ms
> [2018-09-26T19:31:01.581+0100][info][gc,phases ] GC(139) Concurrent
> Relocate 62.703ms
> [2018-09-26T19:31:01.581+0100][info][gc,load ] GC(139) Load:
> 7.51/8.25/7.38
> [2018-09-26T19:31:01.581+0100][info][gc,mmu ] GC(139) MMU:
> 2ms/0.0%, 5ms/0.0%, 10ms/0.0%, 20ms/0.0%, 50ms/0.0%, 100ms/0.0%
> [2018-09-26T19:31:01.581+0100][info][gc,marking ] GC(139) Mark: 2
> stripe(s), 3 proactive flush(es), 1 terminate flush(es), 0
> completion(s), 0 continuation(s)
> [2018-09-26T19:31:01.581+0100][info][gc,reloc ] GC(139) Relocation:
> Incomplete
> [2018-09-26T19:31:01.581+0100][info][gc,nmethod ] GC(139) NMethods: 385
> registered, 180 unregistered
> [2018-09-26T19:31:01.581+0100][info][gc,metaspace] GC(139) Metaspace: 8M
> used, 8M capacity, 9M committed, 10M reserved
> [2018-09-26T19:31:01.581+0100][info][gc,ref ] GC(139) Soft: 103
> encountered, 0 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref ] GC(139) Weak: 148
> encountered, 75 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref ] GC(139) Final: 0
> encountered, 0 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,ref ] GC(139) Phantom: 42
> encountered, 36 discovered, 0 enqueued
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139)
> Mark Start Mark End Relocate Start Relocate End
> High Low
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Capacity:
> 12288M (100%) 12288M (100%) 12288M (100%) 12288M (100%)
> 12288M (100%) 12288M (100%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Reserve:
> 42M (0%) 42M (0%) 42M (0%) 42M (0%)
> 42M (0%) 36M (0%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Free:
> 3442M (28%) 0M (0%) 0M (0%) 3550M (29%)
> 3798M (31%) 0M (0%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Used:
> 8804M (72%) 12246M (100%) 12246M (100%) 8696M (71%)
> 12252M (100%) 8448M (69%)
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Live:
> - 4849M (39%) 4849M (39%) 4849M (39%)
> - -
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Allocated:
> - 3442M (28%) 3444M (28%) 3830M (31%)
> - -
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Garbage:
> - 3954M (32%) 3952M (32%) 54M (0%)
> - -
> [2018-09-26T19:31:01.582+0100][info][gc,heap ] GC(139) Reclaimed:
> - - 2M (0%) 3900M (32%)
> - -
> [2018-09-26T19:31:01.582+0100][info][gc ] GC(139) Garbage
> Collection (Allocation Rate) 8804M(72%)->8696M(71%)
> [2018-09-26T19:31:01.681+0100][info][gc,start ] GC(140) Garbage
> Collection (Allocation Rate)
> [2018-09-26T19:31:01.682+0100][info][gc,phases ] GC(140) Pause Mark
> Start 0.094ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] === Garbage
> Collection Statistics
> =======================================================================================================================
> [2018-09-26T19:31:02.682+0100][info][gc,stats ]
> Last 10s Last 10m
> Last 10h Total
> [2018-09-26T19:31:02.682+0100][info][gc,stats ]
> Avg / Max Avg /
> Max Avg / Max Avg / Max
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Collector: Garbage
> Collection Cycle 3063.847 / 3099.858 2011.178 /
> 3671.006 2011.178 / 3671.006 2011.178 / 3671.006 ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Contention: Mark
> Segment Reset Contention 1 / 3 1 /
> 8 1 / 8 1 / 8 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Contention: Mark
> SeqNum Reset Contention 0 / 1 0 /
> 2 0 / 2 0 / 2 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Contention:
> Relocation Contention 0 / 0
> 2 / 51 2 / 51 2 / 51 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Critical:
> Allocation Stall 1922.303 / 2231.541
> 1448.690 / 2297.381 1448.690 / 2297.381 1448.690 / 2297.381 ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Critical:
> Allocation Stall 2 / 8
> 1 / 12 1 / 12 1 / 12 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Critical: GC
> Locker Stall 0.000 / 0.000 0.000 /
> 0.000 0.000 / 0.000 0.000 / 0.000 ms
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Critical: GC
> Locker Stall 0 / 0 0 /
> 0 0 / 0 0 / 0 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory:
> Allocation Rate 1428 / 2914
> 2128 / 5490 2128 / 5490 2128 / 5490 MB/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Heap
> Used After Mark 12246 / 12246 10840 /
> 12246 10840 / 12246 10840 / 12246 MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Heap
> Used After Relocation 8659 / 9040 6584 /
> 9040 6584 / 9040 6584 / 9040 MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Heap
> Used Before Mark 9076 / 9298 7126 /
> 9478 7126 / 9478 7126 / 9478 MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Heap
> Used Before Relocation 12246 / 12246 10850 /
> 12246 10850 / 12246 10850 / 12246 MB
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Out Of
> Memory 0 / 0 0 / 0
> 0 / 0 0 / 0 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Page
> Cache Flush 0 / 0 0 /
> 32 0 / 32 0 / 32 MB/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Page
> Cache Hit L1 719 / 1457 1053 /
> 2604 1053 / 2604 1053 / 2604 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Page
> Cache Hit L2 0 / 0 0 /
> 0 0 / 0 0 / 0 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Page
> Cache Miss 0 / 0 19 /
> 1940 19 / 1940 19 / 1940 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Undo
> Object Allocation Failed 0 / 0 0 /
> 4 0 / 4 0 / 4 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Undo
> Object Allocation Succeeded 0 / 0 2 /
> 51 2 / 51 2 / 51 ops/s
> [2018-09-26T19:31:02.682+0100][info][gc,stats ] Memory: Undo
> Page Allocation 0 / 3 0 /
> 10 0 / 10 0 / 10 ops/s
>
>
> On Thu, 27 Sep 2018 at 09:39, Per Liden <per.liden at oracle.com
> <mailto:per.liden at oracle.com>> wrote:
>
> Hi,
>
> On 09/26/2018 10:43 PM, Uberto Barbini wrote:
> > Hi Per,
> >
> > thanks for your reply.
> > My comments inline
> >
> > > Do you have any idea of what can be the possible cause?
> >
> > If you see allocation stalls it's an indication that ZGC
> can't keep up
> > with the allocation rate of your application. I.e. the
> headroom in the
> > heap (max heap size - live-set size) is too small. To avoid this,
> > increase the max heap size until you don't get allocation
> stalls and
> > you
> > have a healthy/acceptable GC frequency.
> >
> >
> > Although the application is pretty aggressive on the allocation, the
> > live-set is relatively small (<1gb), so I thought 6gb of heap
> should be
> > enough.
> > I've doubled it to 12gb of heap and performance improved a bit
> but there
> > are still lots of Allocation Stalls.
> >
> > Another (secondary) option to play with is -XX:ConcGCThreads
> to give
> > more CPU-time to the GC (essentially making it faster so each
> cycle
> > will
> > be shorter). You can use -Xlog:gc+init to see how many
> threads ZGC
> > automatically selects now and go from there.
> >
> >
> > Changing the number of threads improved things a lot, but didn't
> really
> > solved the problem, only moved it further.
> > Using XX:ConcGCThreads=4, in 60 seconds it is able to calculate
> about
> > 480k positions, same than g1 (but with better latency) and almost
> the
> > double then before.
> > But increasing the search depth to 120 seconds for move, the
> Allocation
> > Stalls popped up again and the app stopped again.
> > Increasing to more than 4 didn't change anything since I've only
> 4 cores
> > in my cpu.
>
> Ok, so it sounds like you've over-provisioned your system. With 4
> concurrent GC threads on a 4 core machine, the GC and your application
> will compete for CPU-time. This is typically bad for latency. On the
> other hand, it seems like you're trying to optimize for throughput
> rather than latency here?
>
> >
> > I'm not familiar enough with ZGC to try an explanation, but in my
> > application only one thread keep most of the long living memory
> > allocations (the tree of positions) whilst the other threads are
> used to
> > evaluate each new position, which means: 1) allocate many objects 2)
> > calculate the result 3) return results to main thread and
> deallocate all
> > objects. Is it possible that this behavior confound ZGC?
>
> No, that behavior is fine. From what I can tell, your application just
> has a very high allocation rate. You need a heap headroom that is at
> least "allocation_rate * gc_cycle_time" to avoid allocation stalls. For
> example, let's say it takes 2 second to complete a GC cycle and your
> application is allocating 10GB/s, then you need at least a 20GB
> headroom.
>
> In you case, you say you have a live-set that is <1GB, so a complete GC
> cycle should be pretty short, like well below 1 second. If your
> ~11GB of
> headroom isn't enough, then you must be allocating more than ~11GB/s?
>
> You can use -Xlog:gc+alloc=debug to continuously print the allocation
> rate. You can also see it in the statistics counters with
> -Xlog:gc+stats
> (or just -Xlog:gc*). You can see the GC cycle time, live-set size
> there too.
>
> cheers,
> Per
>
More information about the zgc-dev
mailing list