Very long allocation stall
Per Liden
per.liden at oracle.com
Wed Sep 26 13:12:40 UTC 2018
Hi,
On 09/25/2018 10:32 PM, Uberto Barbini wrote:
> Hi, I've done some gc comparison tests on my bot to play the game of
> Go. Internally it uses MontecarloTreeSearch algoritm and it's written
> in Kotlin.
> https://github.com/uberto/kakomu
>
> For all tests: openjdk11 build oracle and shenandoah 20180917. My
> notebook with i7 and 6gb of heap.
>
> This is the result of a single thread MCTS runs:
>
> board 9x9, median of 1000 runs after a 100k warmup. smaller is better
>
> 0.277 -XX:+UseConcMarkSweepGC
> 0.372 -XX:+UseG1GC
> 0.487 -XX:+UseShenandoahGC
> 0.297 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
>
> So far no surprises and ZGC great results.
>
> When I launch the real app which is running on several threads using kotlin
> coroutines and channels I have this numbers (runs in 60 secs greater is
> better):
>
> common vm options
> -Xms6g -Xmx6g -Xlog:gc*:gc.log:time,level,tags
>
> 573797 -XX:+UseConcMarkSweepGC
> 481320 -XX:+UseG1GC
> 399746 -XX:+UseShenandoahGC
> 440784 -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=passive
> 256227 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
>
> Of course new low latency gcs are not as good as CMS but that was expected.
> But ZGC performance was really terrible. Looking at the application results
> in real time, it starts fast (close to 10k position for second) but then it
> slows down and after a while it cannot calculate any more position for
> seconds. This behavior is consistent and repeated for each move (that is in
> every move it starts fast and become slow).
> This is not something related to the game logic since with the other gc the
> speed stay more or less the same.
>
> --- application output:
> Let me think for 60 seconds...
> 60... runouts 173
> 59... runouts 5820
> 58... runouts 15654
> 57... runouts 25556
> 56... runouts 35521
> 55... runouts 44406
> ...
> 31... runouts 168994
> 30... runouts 173728
> 29... runouts 174684
> ...
> 4... runouts 249748
> 3... runouts 251997
> 2... runouts 254409
> 1... runouts 256227
> -----
>
> Looking at the gc logs I can see a lot of Allocation Stalls, which are
> increasily longer, at the beginning of the move are about 70ms. then they
> become 1500ms.
>
> Do you have any idea of what can be the possible cause?
If you see allocation stalls it's an indication that ZGC can't keep up
with the allocation rate of your application. I.e. the headroom in the
heap (max heap size - live-set size) is too small. To avoid this,
increase the max heap size until you don't get allocation stalls and you
have a healthy/acceptable GC frequency.
Another (secondary) option to play with is -XX:ConcGCThreads to give
more CPU-time to the GC (essentially making it faster so each cycle will
be shorter). You can use -Xlog:gc+init to see how many threads ZGC
automatically selects now and go from there.
> I can try to replicate it on a smaller test in java if this is useful.
> Is there any tweak I can use?
> Apart from this application I found ZGC very good in general.
Nice to hear!
> Thanks for your work!
Thanks!
cheers,
Per
>
> Uberto Barbini
>
> =============
> extract from gc.log
>
> [2018-09-25T21:17:22.397+0100][info][gc,phases ] GC(104) Pause Relocate
> Start 0.075ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-7) 1516.918ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-11) 1516.316ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-3) 1515.755ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-9) 1515.082ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-15) 1515.484ms
> [2018-09-25T21:17:22.397+0100][info][gc ] Allocation Stall
> (ForkJoinPool.commonPool-worker-13) 1516.103ms
> [2018-09-25T21:17:22.479+0100][info][gc,phases ] GC(104) Concurrent
> Relocate 81.977ms
>
> ...
>
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] === Garbage Collection
> Statistics
> =======================================================================================================================
> [2018-09-25T21:17:12.289+0100][info][gc,stats ]
> Last 10s Last 10m
> Last 10h Total
> [2018-09-25T21:17:12.289+0100][info][gc,stats ]
> Avg / Max Avg / Max
> Avg / Max Avg / Max
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Collector: Garbage
> Collection Cycle 1689.037 / 1877.517 1026.464 /
> 2230.686 1026.464 / 2230.686 1026.464 / 2230.686 ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Contention: Mark
> Segment Reset Contention 0 / 0 0 / 0
> 0 / 0 0 / 0 ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Contention: Mark SeqNum
> Reset Contention 0 / 0 0 / 0
> 0 / 0 0 / 0 ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Contention: Relocation
> Contention 3 / 31 16 / 84
> 16 / 84 16 / 84 ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Critical: Allocation
> Stall 1022.509 / 1412.459 642.591 / 1711.317
> 642.591 / 1711.317 642.591 / 1711.317 ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Critical: Allocation
> Stall 3 / 8 4 / 15
> 4 / 15 4 / 15 ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Critical: GC Locker
> Stall 0.000 / 0.000 0.000 / 0.000
> 0.000 / 0.000 0.000 / 0.000 ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Critical: GC Locker
> Stall 0 / 0 0 / 0
> 0 / 0 0 / 0 ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Allocation
> Rate 1746 / 3016 2619 / 5956
> 2619 / 5956 2619 / 5956 MB/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Heap Used
> After Mark 6102 / 6102 5416 / 6102
> 5416 / 6102 5416 / 6102 MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Heap Used
> After Relocation 3639 / 4022 3046 / 4022
> 3046 / 4022 3046 / 4022 MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Heap Used
> Before Mark 3879 / 4242 3365 / 4326
> 3365 / 4326 3365 / 4326 MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Heap Used
> Before Relocation 6102 / 6102 5390 / 6102
> 5390 / 6102 5390 / 6102 MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats ] Memory: Out Of
> Memory 0 / 0 0 / 0
> 0 / 0 0 / 0 ops/s
>
More information about the zgc-dev
mailing list