Information on how to parse/interpret ZGC Logs

Peter Booth peter_booth at me.com
Mon Feb 10 17:27:59 UTC 2020


Prabhash,

What OS version?
Is it a vanilla OS install?
Can you print the output of the following?
(Assuming Linux)
egrep ‘thp|trans’ /proc/vmstat
tail -28 /proc/cpuinfo

Peter

Sent from my iPhone

> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <prabhashrathore at gmail.com> wrote:
> 
> Hi Per,
> 
> Thanks for your reply!
> 
> About ZGC logs, in general I am trying to understand following:
> 
>   - What are the full pause times?
>   - How many such pause per unit time?
>   - Anything else which helps me eliminate GC as cause for high
>   application latency.
> 
> This is how I have configured ZGC logging at JVM level, wondering if I
> should add other tags like Safepoint to get more details about GC stats:
> -Xlog:gc*=debug:file=gc.log
> 
> All JVM flas used in my application:
> -Xms130G -Xmx130G
> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
> 
> It's a large machine with 96 threads and 196 GB RAM.
> 
> I have -XX:+AlwaysPreTouch configured as one another option. With
> AlwaysPreTouch option, I see in Linux top command shows a very high shared
> and resident memory. My max heap size is configured as 130 GB but I see
> shared memory is shown as 388 GB and Resident memory as 436 GB. On the
> other hand, total virtual memory for this process in top is shown as 17.1
> tera byte. How is this possible? My whole machine size is 196 GB (is this
> accounting for things swapped out to disk). I did see without
> AlwaysPretouch, numbers look close to the heap size. Trying to understand
> why with PreTouch, process memory is shown was higher than configured size?
> I understand shared memory has all shared libs mapped out but how can it be
> such. a large size?
> 
> Regarding high GC pause time, I did notice that my machine was low on
> memory and it was swapping, hence slowing down everything. For now I have
> disabled Swappines completely with Kernel VM tunable but I am still trying
> to find the actual cause of why swapping kicked in. This machine only runs
> this particular Java applicaion which has 130 GB heap size. Other than
> heap, I still have 66 GB memory available on host. Trying to figure out if
> there is a native memory leak. If you have any inputs on this then please
> share.
> 
> Thanks!
> Prabhash Rathore
> 
>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com> wrote:
>> 
>> Hi,
>> 
>>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>>> Hello,
>>> 
>>> We have decided to use ZGC Garbage Collector for our Java application
>>> running on Java 11. I was wondering if there are any tools or any
>>> documenation on how to interpret ZGC logs.
>> 
>> Is there something in particular in the logs you're wondering about?
>> 
>>> 
>>> I found following statistics in ZGC log which as per my understanding
>> shows
>>> a very large allocation stall of 3902042.342 milliseconds. It will be
>>> really great if I can get some help to understand this further.
>> 
>> I can see that you've had making times that is more than an hour long,
>> which suggests that something in your system is seriously wrong (like
>> extremely overloaded or an extremely slow disk that you log to... just
>> guessing here). I think you need to have a broader look at the health of
>> the system before we can draw any conclusion from the GC logs.
>> 
>> cheers,
>> Per
>> 
>>> 
>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>>> 
>> =======================================================================================================================
>>> [2020-02-02T22:37:36.883+0000]
>>>                  Last 10s              Last 10m              Last 10h
>>>          Total
>>> [2020-02-02T22:37:36.883+0000]
>>>                  Avg / Max             Avg / Max             Avg / Max
>>>         Avg / Max
>>> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
>>>                0.000 / 0.000      7789.187 / 7789.187  12727.424 /
>>> 3903938.012  1265.033 / 3903938.012   ms
>>> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset Contention
>>>                   0 / 0                10 / 1084            176 / 15122
>>>        42 / 15122       ops/s
>>> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset Contention
>>>                    0 / 0                 0 / 5                 0 / 31
>>>          0 / 31          ops/s
>>> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
>>>                   0 / 0                 0 / 3                 1 / 708
>>>         7 / 890         ops/s
>>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>>                0.000 / 0.000         0.000 / 0.000      6714.722 /
>>> 3902042.342  6714.722 / 3902042.342   ms
>>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>>                    0 / 0                 0 / 0                12 / 4115
>>>          2 / 4115        ops/s
>>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>>               0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
>>>     1.251 / 6.561       ms
>>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>>                   0 / 0                 0 / 0                 0 / 1
>>>         0 / 1           ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
>>>                   0 / 0                 6 / 822             762 / 25306
>>>      1548 / 25306       MB/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
>>>                    0 / 0             92170 / 92170         89632 /
>> 132896
>>>      30301 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
>>>                    0 / 0             76376 / 76376         67490 /
>> 132928
>>>       8047 / 132928      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
>>>                   0 / 0             92128 / 92128         84429 / 132896
>>>      29452 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
>>>                   0 / 0             86340 / 86340         76995 / 132896
>>>      15862 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
>>>                   0 / 0                 0 / 0                 0 / 0
>>>         0 / 0           ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
>>>                    0 / 0                 0 / 0                62 / 2868
>>>         16 / 2868        MB/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
>>>                   0 / 0                 7 / 2233            277 / 11553
>>>       583 / 11553       ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
>>>                   0 / 0                 0 / 0                20 / 4619
>>>         59 / 4619        ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
>>>                   0 / 0                 0 / 0                15 / 1039
>>>          3 / 1297        ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation Failed
>>>                   0 / 0                 0 / 0                 0 / 24
>>>          0 / 24          ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
>>> Succeeded                  0 / 0                 0 / 3                 1
>> /
>>> 708               7 / 890         ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
>>>                    0 / 0                 0 / 12               30 / 3464
>>>          7 / 3464        ops/s
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
>>> Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
>>> 1484.886      1.155 / 1484.886    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
>>>               0.000 / 0.000      7016.569 / 7016.569  11758.365 /
>>> 3901893.544  1103.558 / 3901893.544   ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
>>>                0.000 / 0.000         0.000 / 0.000      1968.844 /
>> 3674.454
>>>   1968.844 / 3674.454    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare Relocation
>>> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
>>> 7103.720     39.453 / 7103.720    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process Non-Strong
>>> References      0.000 / 0.000         2.003 / 2.003         2.738 /
>> 34.406
>>>        2.253 / 34.406      ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
>>>               0.000 / 0.000       261.822 / 261.822     335.954 /
>> 2207.669
>>>     45.868 / 2207.669    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation
>> Set
>>>               0.000 / 0.000         6.083 / 6.083        13.489 /
>> 1128.678
>>>      3.574 / 1128.678    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select Relocation
>>> Set              0.000 / 0.000         6.379 / 6.379        97.530 /
>>> 1460.679     18.439 / 1460.679    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
>>>                0.000 / 0.000         4.420 / 4.420         6.219 /
>> 26.498
>>>      6.474 / 40.883      ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
>>>                0.000 / 0.000        14.836 / 14.836       11.893 /
>> 28.350
>>>     11.664 / 41.767      ms
>>> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
>>>                0.000 / 0.000        13.411 / 13.411       30.849 /
>> 697.344
>>>      11.995 / 697.344     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
>>>               0.000 / 0.000      7015.793 / 7016.276  18497.265 /
>>> 3901893.075  1690.497 / 3901893.075   ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
>>>                0.000 / 0.000         1.127 / 13.510        1.292 /
>> 219.999
>>>       1.280 / 219.999     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
>>>               0.000 / 0.000         1.295 / 2.029        47.094 /
>> 34869.359
>>>     4.797 / 34869.359   ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Terminate
>>>               0.000 / 0.000         1.212 / 14.847        1.760 /
>> 3799.238
>>>      1.724 / 3799.238    ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Enqueue
>>>               0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
>>>     0.017 / 2.350       ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Process
>>>               0.000 / 0.000         0.599 / 0.599         0.768 / 23.966
>>>      0.495 / 23.966      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>>               0.000 / 0.000         0.882 / 1.253         1.155 / 21.699
>>>      1.077 / 23.602      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
>> 0.308 /
>>> 10.868        0.310 / 23.219      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> StringTable             0.000 / 0.000         0.289 / 0.496
>> 0.390 /
>>> 12.794        0.363 / 22.907      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> VMWeakHandles           0.000 / 0.000         0.230 / 0.469
>> 0.329 /
>>> 21.267        0.331 / 23.135      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
>>>               0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
>>>     0.480 / 17.208      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
>>>               0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
>>>     0.226 / 3.451       ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
>>>                0.000 / 0.000         1.195 / 1.195         1.324 / 5.082
>>>       1.408 / 11.219      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>>               0.000 / 0.000         6.968 / 10.865       12.329 /
>> 693.701
>>>     6.431 / 1300.994    ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>> ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
>>>   9.635 / 693.405       3.476 / 693.405     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
>>>               0.000 / 0.000         0.842 / 2.731         0.996 / 83.553
>>>      0.780 / 83.553      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
>>>                0.000 / 0.000         1.171 / 6.314         0.866 /
>> 17.875
>>>      0.837 / 25.708      ms
>>> 
>>> 
>>> Thank you!
>>> Prabhash Rathore
>>> 
>> 



More information about the zgc-dev mailing list