Information on how to parse/interpret ZGC Logs
Peter Booth
peter_booth at me.com
Mon Feb 10 17:27:59 UTC 2020
Prabhash,
What OS version?
Is it a vanilla OS install?
Can you print the output of the following?
(Assuming Linux)
egrep ‘thp|trans’ /proc/vmstat
tail -28 /proc/cpuinfo
Peter
Sent from my iPhone
> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <prabhashrathore at gmail.com> wrote:
>
> Hi Per,
>
> Thanks for your reply!
>
> About ZGC logs, in general I am trying to understand following:
>
> - What are the full pause times?
> - How many such pause per unit time?
> - Anything else which helps me eliminate GC as cause for high
> application latency.
>
> This is how I have configured ZGC logging at JVM level, wondering if I
> should add other tags like Safepoint to get more details about GC stats:
> -Xlog:gc*=debug:file=gc.log
>
> All JVM flas used in my application:
> -Xms130G -Xmx130G
> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
>
> It's a large machine with 96 threads and 196 GB RAM.
>
> I have -XX:+AlwaysPreTouch configured as one another option. With
> AlwaysPreTouch option, I see in Linux top command shows a very high shared
> and resident memory. My max heap size is configured as 130 GB but I see
> shared memory is shown as 388 GB and Resident memory as 436 GB. On the
> other hand, total virtual memory for this process in top is shown as 17.1
> tera byte. How is this possible? My whole machine size is 196 GB (is this
> accounting for things swapped out to disk). I did see without
> AlwaysPretouch, numbers look close to the heap size. Trying to understand
> why with PreTouch, process memory is shown was higher than configured size?
> I understand shared memory has all shared libs mapped out but how can it be
> such. a large size?
>
> Regarding high GC pause time, I did notice that my machine was low on
> memory and it was swapping, hence slowing down everything. For now I have
> disabled Swappines completely with Kernel VM tunable but I am still trying
> to find the actual cause of why swapping kicked in. This machine only runs
> this particular Java applicaion which has 130 GB heap size. Other than
> heap, I still have 66 GB memory available on host. Trying to figure out if
> there is a native memory leak. If you have any inputs on this then please
> share.
>
> Thanks!
> Prabhash Rathore
>
>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi,
>>
>>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>>> Hello,
>>>
>>> We have decided to use ZGC Garbage Collector for our Java application
>>> running on Java 11. I was wondering if there are any tools or any
>>> documenation on how to interpret ZGC logs.
>>
>> Is there something in particular in the logs you're wondering about?
>>
>>>
>>> I found following statistics in ZGC log which as per my understanding
>> shows
>>> a very large allocation stall of 3902042.342 milliseconds. It will be
>>> really great if I can get some help to understand this further.
>>
>> I can see that you've had making times that is more than an hour long,
>> which suggests that something in your system is seriously wrong (like
>> extremely overloaded or an extremely slow disk that you log to... just
>> guessing here). I think you need to have a broader look at the health of
>> the system before we can draw any conclusion from the GC logs.
>>
>> cheers,
>> Per
>>
>>>
>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>>>
>> =======================================================================================================================
>>> [2020-02-02T22:37:36.883+0000]
>>> Last 10s Last 10m Last 10h
>>> Total
>>> [2020-02-02T22:37:36.883+0000]
>>> Avg / Max Avg / Max Avg / Max
>>> Avg / Max
>>> [2020-02-02T22:37:36.883+0000] Collector: Garbage Collection Cycle
>>> 0.000 / 0.000 7789.187 / 7789.187 12727.424 /
>>> 3903938.012 1265.033 / 3903938.012 ms
>>> [2020-02-02T22:37:36.883+0000] Contention: Mark Segment Reset Contention
>>> 0 / 0 10 / 1084 176 / 15122
>>> 42 / 15122 ops/s
>>> [2020-02-02T22:37:36.883+0000] Contention: Mark SeqNum Reset Contention
>>> 0 / 0 0 / 5 0 / 31
>>> 0 / 31 ops/s
>>> [2020-02-02T22:37:36.883+0000] Contention: Relocation Contention
>>> 0 / 0 0 / 3 1 / 708
>>> 7 / 890 ops/s
>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall
>>> 0.000 / 0.000 0.000 / 0.000 6714.722 /
>>> 3902042.342 6714.722 / 3902042.342 ms
>>> [2020-02-02T22:37:36.883+0000] Critical: Allocation Stall
>>> 0 / 0 0 / 0 12 / 4115
>>> 2 / 4115 ops/s
>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall
>>> 0.000 / 0.000 0.000 / 0.000 3.979 / 6.561
>>> 1.251 / 6.561 ms
>>> [2020-02-02T22:37:36.883+0000] Critical: GC Locker Stall
>>> 0 / 0 0 / 0 0 / 1
>>> 0 / 1 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Allocation Rate
>>> 0 / 0 6 / 822 762 / 25306
>>> 1548 / 25306 MB/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After Mark
>>> 0 / 0 92170 / 92170 89632 /
>> 132896
>>> 30301 / 132896 MB
>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used After Relocation
>>> 0 / 0 76376 / 76376 67490 /
>> 132928
>>> 8047 / 132928 MB
>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before Mark
>>> 0 / 0 92128 / 92128 84429 / 132896
>>> 29452 / 132896 MB
>>> [2020-02-02T22:37:36.883+0000] Memory: Heap Used Before Relocation
>>> 0 / 0 86340 / 86340 76995 / 132896
>>> 15862 / 132896 MB
>>> [2020-02-02T22:37:36.883+0000] Memory: Out Of Memory
>>> 0 / 0 0 / 0 0 / 0
>>> 0 / 0 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Flush
>>> 0 / 0 0 / 0 62 / 2868
>>> 16 / 2868 MB/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L1
>>> 0 / 0 7 / 2233 277 / 11553
>>> 583 / 11553 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Hit L2
>>> 0 / 0 0 / 0 20 / 4619
>>> 59 / 4619 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Page Cache Miss
>>> 0 / 0 0 / 0 15 / 1039
>>> 3 / 1297 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation Failed
>>> 0 / 0 0 / 0 0 / 24
>>> 0 / 24 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Object Allocation
>>> Succeeded 0 / 0 0 / 3 1
>> /
>>> 708 7 / 890 ops/s
>>> [2020-02-02T22:37:36.883+0000] Memory: Undo Page Allocation
>>> 0 / 0 0 / 12 30 / 3464
>>> 7 / 3464 ops/s
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Destroy Detached
>>> Pages 0.000 / 0.000 0.004 / 0.004 11.675 /
>>> 1484.886 1.155 / 1484.886 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark
>>> 0.000 / 0.000 7016.569 / 7016.569 11758.365 /
>>> 3901893.544 1103.558 / 3901893.544 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Mark Continue
>>> 0.000 / 0.000 0.000 / 0.000 1968.844 /
>> 3674.454
>>> 1968.844 / 3674.454 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Prepare Relocation
>>> Set 0.000 / 0.000 453.732 / 453.732 364.535 /
>>> 7103.720 39.453 / 7103.720 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Process Non-Strong
>>> References 0.000 / 0.000 2.003 / 2.003 2.738 /
>> 34.406
>>> 2.253 / 34.406 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Relocate
>>> 0.000 / 0.000 261.822 / 261.822 335.954 /
>> 2207.669
>>> 45.868 / 2207.669 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Reset Relocation
>> Set
>>> 0.000 / 0.000 6.083 / 6.083 13.489 /
>> 1128.678
>>> 3.574 / 1128.678 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Concurrent Select Relocation
>>> Set 0.000 / 0.000 6.379 / 6.379 97.530 /
>>> 1460.679 18.439 / 1460.679 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark End
>>> 0.000 / 0.000 4.420 / 4.420 6.219 /
>> 26.498
>>> 6.474 / 40.883 ms
>>> [2020-02-02T22:37:36.883+0000] Phase: Pause Mark Start
>>> 0.000 / 0.000 14.836 / 14.836 11.893 /
>> 28.350
>>> 11.664 / 41.767 ms
>>> [2020-02-02T22:37:36.884+0000] Phase: Pause Relocate Start
>>> 0.000 / 0.000 13.411 / 13.411 30.849 /
>> 697.344
>>> 11.995 / 697.344 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark
>>> 0.000 / 0.000 7015.793 / 7016.276 18497.265 /
>>> 3901893.075 1690.497 / 3901893.075 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Idle
>>> 0.000 / 0.000 1.127 / 13.510 1.292 /
>> 219.999
>>> 1.280 / 219.999 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try Flush
>>> 0.000 / 0.000 1.295 / 2.029 47.094 /
>> 34869.359
>>> 4.797 / 34869.359 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Mark Try Terminate
>>> 0.000 / 0.000 1.212 / 14.847 1.760 /
>> 3799.238
>>> 1.724 / 3799.238 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References Enqueue
>>> 0.000 / 0.000 0.009 / 0.009 0.022 / 1.930
>>> 0.017 / 2.350 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent References Process
>>> 0.000 / 0.000 0.599 / 0.599 0.768 / 23.966
>>> 0.495 / 23.966 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>> 0.000 / 0.000 0.882 / 1.253 1.155 / 21.699
>>> 1.077 / 23.602 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>> JNIWeakHandles 0.000 / 0.000 0.301 / 0.943
>> 0.308 /
>>> 10.868 0.310 / 23.219 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>> StringTable 0.000 / 0.000 0.289 / 0.496
>> 0.390 /
>>> 12.794 0.363 / 22.907 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Concurrent Weak Roots
>>> VMWeakHandles 0.000 / 0.000 0.230 / 0.469
>> 0.329 /
>>> 21.267 0.331 / 23.135 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Mark Try Complete
>>> 0.000 / 0.000 0.000 / 0.000 0.501 / 4.801
>>> 0.480 / 17.208 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Remap TLABS
>>> 0.000 / 0.000 0.252 / 0.252 0.195 / 0.528
>>> 0.226 / 3.451 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Retire TLABS
>>> 0.000 / 0.000 1.195 / 1.195 1.324 / 5.082
>>> 1.408 / 11.219 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots
>>> 0.000 / 0.000 6.968 / 10.865 12.329 /
>> 693.701
>>> 6.431 / 1300.994 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots
>>> ClassLoaderDataGraph 0.000 / 0.000 4.819 / 8.232
>>> 9.635 / 693.405 3.476 / 693.405 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots CodeCache
>>> 0.000 / 0.000 0.842 / 2.731 0.996 / 83.553
>>> 0.780 / 83.553 ms
>>> [2020-02-02T22:37:36.884+0000] Subphase: Pause Roots JNIHandles
>>> 0.000 / 0.000 1.171 / 6.314 0.866 /
>> 17.875
>>> 0.837 / 25.708 ms
>>>
>>>
>>> Thank you!
>>> Prabhash Rathore
>>>
>>
More information about the zgc-dev
mailing list