Very long allocation stall

Per Liden per.liden at oracle.com
Wed Sep 26 13:12:40 UTC 2018


Hi,

On 09/25/2018 10:32 PM, Uberto Barbini wrote:
> Hi, I've done some gc comparison tests on my bot to play the game of
> Go. Internally it uses MontecarloTreeSearch algoritm and it's written
> in Kotlin.
> https://github.com/uberto/kakomu
> 
> For all tests:  openjdk11  build oracle and shenandoah 20180917. My
> notebook with i7 and 6gb of heap.
> 
> This is the result of a single thread MCTS runs:
> 
> board 9x9, median of 1000 runs after a 100k warmup.  smaller is better
> 
> 0.277 -XX:+UseConcMarkSweepGC
> 0.372 -XX:+UseG1GC
> 0.487 -XX:+UseShenandoahGC
> 0.297 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
> 
> So far no surprises and ZGC great results.
> 
> When I launch the real app which is running on several threads using kotlin
> coroutines and channels I have this numbers (runs in 60 secs greater is
> better):
> 
> common vm options
> -Xms6g -Xmx6g -Xlog:gc*:gc.log:time,level,tags
> 
> 573797 -XX:+UseConcMarkSweepGC
> 481320 -XX:+UseG1GC
> 399746 -XX:+UseShenandoahGC
> 440784 -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=passive
> 256227 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
> 
> Of course new low latency gcs are not as good as CMS but that was expected.
> But ZGC performance was really terrible. Looking at the application results
> in real time, it starts fast (close to 10k position for second) but then it
> slows down and after a while it cannot calculate any more position for
> seconds. This behavior is consistent and repeated for each move (that is in
> every move it starts fast and become slow).
> This is not something related to the game logic since with the other gc the
> speed stay more or less the same.
> 
> --- application output:
> Let me think for 60 seconds...
> 60...   runouts 173
> 59...   runouts 5820
> 58...   runouts 15654
> 57...   runouts 25556
> 56...   runouts 35521
> 55...   runouts 44406
> ...
> 31...   runouts 168994
> 30...   runouts 173728
> 29...   runouts 174684
> ...
> 4...   runouts 249748
> 3...   runouts 251997
> 2...   runouts 254409
> 1...   runouts 256227
> -----
> 
> Looking at the gc logs I can see a lot of Allocation Stalls, which are
> increasily longer, at the beginning of the move are about 70ms. then they
> become 1500ms.
> 
> Do you have any idea of what can be the possible cause?

If you see allocation stalls it's an indication that ZGC can't keep up 
with the allocation rate of your application. I.e. the headroom in the 
heap (max heap size - live-set size) is too small. To avoid this, 
increase the max heap size until you don't get allocation stalls and you 
have a healthy/acceptable GC frequency.

Another (secondary) option to play with is -XX:ConcGCThreads to give 
more CPU-time to the GC (essentially making it faster so each cycle will 
be shorter). You can use -Xlog:gc+init to see how many threads ZGC 
automatically selects now and go from there.

> I can try to replicate it on a smaller test in java if this is useful.
> Is there any tweak I can use?
> Apart from this application I found ZGC very good in general.

Nice to hear!

> Thanks for your work!

Thanks!

cheers,
Per

> 
> Uberto Barbini
> 
> =============
> extract from gc.log
> 
> [2018-09-25T21:17:22.397+0100][info][gc,phases   ] GC(104) Pause Relocate
> Start 0.075ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-7) 1516.918ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-11) 1516.316ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-3) 1515.755ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-9) 1515.082ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-15) 1515.484ms
> [2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
> (ForkJoinPool.commonPool-worker-13) 1516.103ms
> [2018-09-25T21:17:22.479+0100][info][gc,phases   ] GC(104) Concurrent
> Relocate 81.977ms
> 
> ...
> 
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ] === Garbage Collection
> Statistics
> =======================================================================================================================
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]
>                                      Last 10s              Last 10m
>      Last 10h                Total
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]
>                                      Avg / Max             Avg / Max
>       Avg / Max             Avg / Max
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]   Collector: Garbage
> Collection Cycle                   1689.037 / 1877.517   1026.464 /
> 2230.686   1026.464 / 2230.686   1026.464 / 2230.686    ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Mark
> Segment Reset Contention                     0 / 0                 0 / 0
>               0 / 0                 0 / 0           ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Mark SeqNum
> Reset Contention                      0 / 0                 0 / 0
>         0 / 0                 0 / 0           ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Relocation
> Contention                             3 / 31               16 / 84
>         16 / 84               16 / 84          ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: Allocation
> Stall                           1022.509 / 1412.459    642.591 / 1711.317
>    642.591 / 1711.317    642.591 / 1711.317    ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: Allocation
> Stall                                  3 / 8                 4 / 15
>          4 / 15                4 / 15          ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: GC Locker
> Stall                               0.000 / 0.000         0.000 / 0.000
>       0.000 / 0.000         0.000 / 0.000       ms
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: GC Locker
> Stall                                   0 / 0                 0 / 0
>           0 / 0                 0 / 0           ops/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Allocation
> Rate                                1746 / 3016           2619 / 5956
>       2619 / 5956           2619 / 5956        MB/s
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
> After Mark                           6102 / 6102           5416 / 6102
>       5416 / 6102           5416 / 6102        MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
> After Relocation                     3639 / 4022           3046 / 4022
>       3046 / 4022           3046 / 4022        MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
> Before Mark                          3879 / 4242           3365 / 4326
>       3365 / 4326           3365 / 4326        MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
> Before Relocation                    6102 / 6102           5390 / 6102
>       5390 / 6102           5390 / 6102        MB
> [2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Out Of
> Memory                                     0 / 0                 0 / 0
>             0 / 0                 0 / 0           ops/s
> 


More information about the zgc-dev mailing list