Very long allocation stall

Uberto Barbini uberto.gama at gmail.com
Tue Sep 25 20:32:02 UTC 2018


Hi, I've done some gc comparison tests on my bot to play the game of
Go. Internally it uses MontecarloTreeSearch algoritm and it's written
in Kotlin.
https://github.com/uberto/kakomu

For all tests:  openjdk11  build oracle and shenandoah 20180917. My
notebook with i7 and 6gb of heap.

This is the result of a single thread MCTS runs:

board 9x9, median of 1000 runs after a 100k warmup.  smaller is better

0.277 -XX:+UseConcMarkSweepGC
0.372 -XX:+UseG1GC
0.487 -XX:+UseShenandoahGC
0.297 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC

So far no surprises and ZGC great results.

When I launch the real app which is running on several threads using kotlin
coroutines and channels I have this numbers (runs in 60 secs greater is
better):

common vm options
-Xms6g -Xmx6g -Xlog:gc*:gc.log:time,level,tags

573797 -XX:+UseConcMarkSweepGC
481320 -XX:+UseG1GC
399746 -XX:+UseShenandoahGC
440784 -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=passive
256227 -XX:+UnlockExperimentalVMOptions -XX:+UseZGC

Of course new low latency gcs are not as good as CMS but that was expected.
But ZGC performance was really terrible. Looking at the application results
in real time, it starts fast (close to 10k position for second) but then it
slows down and after a while it cannot calculate any more position for
seconds. This behavior is consistent and repeated for each move (that is in
every move it starts fast and become slow).
This is not something related to the game logic since with the other gc the
speed stay more or less the same.

--- application output:
Let me think for 60 seconds...
60...   runouts 173
59...   runouts 5820
58...   runouts 15654
57...   runouts 25556
56...   runouts 35521
55...   runouts 44406
...
31...   runouts 168994
30...   runouts 173728
29...   runouts 174684
...
4...   runouts 249748
3...   runouts 251997
2...   runouts 254409
1...   runouts 256227
-----

Looking at the gc logs I can see a lot of Allocation Stalls, which are
increasily longer, at the beginning of the move are about 70ms. then they
become 1500ms.

Do you have any idea of what can be the possible cause?
I can try to replicate it on a smaller test in java if this is useful.
Is there any tweak I can use?
Apart from this application I found ZGC very good in general.
Thanks for your work!

Uberto Barbini

=============
extract from gc.log

[2018-09-25T21:17:22.397+0100][info][gc,phases   ] GC(104) Pause Relocate
Start 0.075ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-7) 1516.918ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-11) 1516.316ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-3) 1515.755ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-9) 1515.082ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-15) 1515.484ms
[2018-09-25T21:17:22.397+0100][info][gc          ] Allocation Stall
(ForkJoinPool.commonPool-worker-13) 1516.103ms
[2018-09-25T21:17:22.479+0100][info][gc,phases   ] GC(104) Concurrent
Relocate 81.977ms

...

[2018-09-25T21:17:12.289+0100][info][gc,stats    ] === Garbage Collection
Statistics
=======================================================================================================================
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]
                                    Last 10s              Last 10m
    Last 10h                Total
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]
                                    Avg / Max             Avg / Max
     Avg / Max             Avg / Max
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]   Collector: Garbage
Collection Cycle                   1689.037 / 1877.517   1026.464 /
2230.686   1026.464 / 2230.686   1026.464 / 2230.686    ms
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Mark
Segment Reset Contention                     0 / 0                 0 / 0
             0 / 0                 0 / 0           ops/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Mark SeqNum
Reset Contention                      0 / 0                 0 / 0
       0 / 0                 0 / 0           ops/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]  Contention: Relocation
Contention                             3 / 31               16 / 84
       16 / 84               16 / 84          ops/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: Allocation
Stall                           1022.509 / 1412.459    642.591 / 1711.317
  642.591 / 1711.317    642.591 / 1711.317    ms
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: Allocation
Stall                                  3 / 8                 4 / 15
        4 / 15                4 / 15          ops/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: GC Locker
Stall                               0.000 / 0.000         0.000 / 0.000
     0.000 / 0.000         0.000 / 0.000       ms
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]    Critical: GC Locker
Stall                                   0 / 0                 0 / 0
         0 / 0                 0 / 0           ops/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Allocation
Rate                                1746 / 3016           2619 / 5956
     2619 / 5956           2619 / 5956        MB/s
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
After Mark                           6102 / 6102           5416 / 6102
     5416 / 6102           5416 / 6102        MB
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
After Relocation                     3639 / 4022           3046 / 4022
     3046 / 4022           3046 / 4022        MB
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
Before Mark                          3879 / 4242           3365 / 4326
     3365 / 4326           3365 / 4326        MB
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Heap Used
Before Relocation                    6102 / 6102           5390 / 6102
     5390 / 6102           5390 / 6102        MB
[2018-09-25T21:17:12.289+0100][info][gc,stats    ]      Memory: Out Of
Memory                                     0 / 0                 0 / 0
           0 / 0                 0 / 0           ops/s


More information about the zgc-dev mailing list