Abnormal high sys time in G1 GC
Roy Zhang
roy.sunny.zhang007 at gmail.com
Mon Jan 20 10:22:43 UTC 2020
Dear JVM experts,
Recently we found GC spike (long STW minor GC), and sys time is high when
we GC time is high. Normally sys time is near 0 seconds and minor GC is
less than 500ms.
From
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html
and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/,
high sys time could be caused by operation system problem/VM related
problem/memory constraint/disk IO pressure/Transparent Huge Pages.
I checked them one by one, don't find any clue, could u please kindly
provide suggestion? Thanks in advance!
1.operation system problem
--We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap,
disk is enough), and there is no error in /var/log/dmesg
2. memory constraint
-- We have enough available memory. available memory (free -m) is 263G
3. disk IO pressure
-- Not find issue from disk info from prometheus node exporter.
Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics
(disk IO util and saturation metrics) which is part of iostat. It could be
caused by big Granularity???
4. VM related problem
-- We are using physical machine
5. Transparent Huge Pages.
It is madvise. It could be a problem, but we don't have this issue
previously. It has been running for nearly 20 weeks.
*cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never*
*JDK version:*
OpenJDK Runtime Environment, 1.8.0_222-b10
*Java Opts:*
-javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml
-server
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=xxxx
-Dcom.sun.management.jmxremote.rmi.port=xxxx
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Xloggc:/server/xxxx.log
-XX:+PrintGCDateStamps
-XX:AutoBoxCacheMax=1000000
-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=50
-XX:InitiatingHeapOccupancyPercent=70
-XX:+ParallelRefProcEnabled
-XX:+ExplicitGCInvokesConcurrent
-XX:+UseStringDeduplication
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xms160g
-Xmx160g
-XX:+HeapDumpOnOutOfMemoryError
*Snippet of GC log:*
2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause)
(young), *6.2899024 secs*]
[Parallel Time: 6255.0 ms, GC Workers: 33]
[GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max:
2756666.1, Diff: 1.2]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3,
Sum: 16.8]
[Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6]
[Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53]
[Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum:
4794.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5,
Sum: 8.8]
* [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: 6.4,
Sum: 201358.4]*
[Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum:
172.9]
[Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: 1475,
Sum: 44650]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum:
7.0]
[GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff:
1.2, Sum: 206383.7]
[GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8,
Diff: 0.4]
[Code Root Fixup: 0.6 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 0.7 ms, GC Workers: 33]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0]
[Clear CT: 4.0 ms]
[Other: 29.6 ms]
[Choose CSet: 0.1 ms]
[Ref Proc: 10.3 ms]
[Ref Enq: 0.6 ms]
[Redirty Cards: 11.3 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 6.5 ms]
[Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M
Heap: 83520.0M(160.0G)->11046.9M(160.0G)]
* [Times: user=27.19 sys=162.28, real=6.30 secs] *
2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause)
(young) (initial-mark), *4.1248088 secs*]
[Parallel Time: 4098.0 ms, GC Workers: 33]
[GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max:
1096883.2, Diff: 1.2]
[Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0,
Sum: 159.7]
[Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5]
[Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43]
* [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2,
Sum: 66401.0]*
[Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7,
Sum: 18.5]
* [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff:
40.2, Sum: 67623.1]*
[Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum:
978.0]
[Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: 8067,
Sum: 217372]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum:
7.9]
[GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff:
1.4, Sum: 135197.8]
[GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9,
Diff: 0.4]
[Code Root Fixup: 0.6 ms]
[Code Root Purge: 0.2 ms]
[String Dedup Fixup: 1.0 ms, GC Workers: 33]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4]
[Clear CT: 3.4 ms]
[Other: 21.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 9.1 ms]
[Ref Enq: 0.9 ms]
[Redirty Cards: 4.3 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 5.3 ms]
[Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M
Heap: 83508.0M(160.0G)->10944.0M(160.0G)]
* [Times: user=68.40 sys=9.11, real=4.13 secs] *
Thanks,
Roy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20200120/c7e14da8/attachment.htm>
More information about the hotspot-gc-use
mailing list