From joakimthun at gmail.com Sun Jan 19 11:02:20 2020 From: joakimthun at gmail.com (Joakim Thun) Date: Sun, 19 Jan 2020 11:02:20 +0000 Subject: Increased ScanRS time when decreasing G1RSetUpdatingPauseTimePercent Message-ID: Hi all, I would really appreciate some help understanding a G1 behaviour I am seeing when decreasing the value of G1RSetUpdatingPauseTimePercent where the goal is to decrease the time spent in the UpdateRS phase by moving some of the work to be processed concurrently by the refinement threads. The behaviour I was expecting to see was a decrease in UpdateRS time which I am seeing but at the expense of more time being spent in the ScanRS phase so the end result i.e. the total pause time end up being very similar with and without the flag set. Decreasing G1RSetUpdatingPauseTimePercent to both 5 and 1 results in similar behaviour. I noticed that the number of scanned cards is much higher in the ScanRS phase when decreasing G1RSetUpdatingPauseTimePercent. Is this expected behaviour? Are there any other flags worth considering to improve the ScanRS time while moving more work to the refinement threads? JVM flags and gc logs with and without the flag set can be found below. Thanks, Joakim *JVM flags:* -XX:-G1UseAdaptiveIHOP -Xms16g -Xmx16g -XX:ParallelGCThreads=15 -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:G1HeapRegionSize=16M *Base line gc logs ( without setting G1RSetUpdatingPauseTimePercent ):* [2020-01-17T14:35:39.003+0000][1419.301s][30887] GC(136) Pause Young (Normal) (G1 Evacuation Pause) [2020-01-17T14:35:39.003+0000][1419.301s][30887] GC(136) Using 15 workers of 15 for evacuation [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Pre Evacuate Collection Set: 0.0ms [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Prepare TLABs: 0.1ms [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Choose Collection Set: 0.0ms [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Humongous Register: 0.0ms [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Evacuate Collection Set: 25.6ms [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Ext Root Scanning (ms): Min: 0.8, Avg: 1.1, Max: 1.2, Diff: 0.3, Sum: 16.5, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Update RS (ms): Min: 10.2, Avg: 12.5, Max: 13.4, Diff: 3.2, Sum: 187.8, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Processed Buffers: Min: 30, Avg: 57.1, Max: 87, Diff: 57, Sum: 856, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Scanned Cards: Min: 6490, Avg: 10050.3, Max: 12584, Diff: 6094, Sum: 150754, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Skipped Cards: Min: 0, Avg: 3.7, Max: 10, Diff: 10, Sum: 56, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 4.3, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Scanned Cards: Min: 0, Avg: 139.5, Max: 287, Diff: 287, Sum: 2092, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Claimed Cards: Min: 0, Avg: 140.3, Max: 287, Diff: 287, Sum: 2104, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Skipped Cards: Min: 0, Avg: 1642.5, Max: 1881, Diff: 1881, Sum: 24638, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.3, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) AOT Root Scanning (ms): skipped [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Object Copy (ms): Min: 10.5, Avg: 11.5, Max: 14.1, Diff: 3.6, Sum: 172.8, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 15, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.2, Workers: 15 [2020-01-17T14:35:39.031+0000][1419.329s][30887] GC(136) GC Worker Total (ms): Min: 25.5, Avg: 25.5, Max: 25.6, Diff: 0.0, Sum: 383.1, Workers: 15 [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Post Evacuate Collection Set: 1.6ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Code Roots Fixup: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Clear Card Table: 0.6ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Reference Processing: 0.2ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Weak Processing: 0.1ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Merge Per-Thread State: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Code Roots Purge: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Redirty Cards: 0.3ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) DerivedPointerTable Update: 0.2ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Free Collection Set: 0.3ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Humongous Reclaim: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Start New Collection Set: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Resize TLABs: 0.1ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Expand Heap After Collection: 0.0ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Other: 0.9ms [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Eden regions: 609->0(609) [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Survivor regions: 5->5(77) [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Old regions: 125->125 [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Humongous regions: 2->2 [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Metaspace: 147406K->147406K(1183744K) [2020-01-17T14:35:39.032+0000][1419.329s][30887] GC(136) Pause Young (Normal) (G1 Evacuation Pause) 11839M->2097M(16384M) 28.293ms With G1RSetUpdatingPauseTimePercent set to 1: [2020-01-17T17:46:58.485+0000][554.067s][18852] Entering safepoint region: G1CollectForAllocation [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Pause Young (Normal) (G1 Evacuation Pause) [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Using 15 workers of 15 for evacuation [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Before GC RS summary [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Recent concurrent refinement statistics [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Processed 224586 cards concurrently [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Of 1084 completed buffers: [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) 1084 (100.0%) by concurrent RS threads. [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) 0 ( 0.0%) by mutator threads. [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Did 0 coarsenings. [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Concurrent RS threads times (s) [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) 0.07 0.06 0.05 0.04 0.04 0.04 0.04 0.02 0.02 0.02 0.01 0.00 0.02 0.01 0.01 [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) Concurrent sampling threads times (s) [2020-01-17T17:46:58.486+0000][554.067s][18852] GC(65) 0.01 [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Current rem set statistics [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Total per region rem sets sizes = 13401K. Max = 498K. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 8722K ( 65.1%) by 614 Young regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 18384B ( 0.1%) by 2 Humongous regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 2719K ( 20.3%) by 303 Free regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 1941K ( 14.5%) by 105 Old regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Static structures = 520K, free_lists = 4826K. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 120031 occupied cards represented. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 120030 (100.0%) entries by 614 Young regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 1 ( 0.0%) entries by 2 Humongous regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 0 ( 0.0%) entries by 303 Free regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 0 ( 0.0%) entries by 105 Old regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Region with largest rem set = 7:(O)[0x0000000407000000,0x0000000408000000,0x0000000408000000], size = 498K, occupied = 0B. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Total heap region code root sets sizes = 1021K. Max = 489K. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 16280B ( 1.6%) by 614 Young regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 32B ( 0.0%) by 2 Humongous regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 4848B ( 0.5%) by 303 Free regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 1000K ( 98.0%) by 105 Old regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 37574 code roots represented. [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 83 ( 0.2%) elements by 614 Young regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 0 ( 0.0%) elements by 2 Humongous regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 0 ( 0.0%) elements by 303 Free regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) 37491 ( 99.8%) elements by 105 Old regions [2020-01-17T17:46:58.487+0000][554.068s][18852] GC(65) Region with largest amount of code roots = 7:(O)[0x0000000407000000,0x0000000408000000,0x0000000408000000], size = 489K, num_elems = 20702. [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) After GC RS summary [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Recent concurrent refinement statistics [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Processed 0 cards concurrently [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Of 407 completed buffers: [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 407 (100.0%) by concurrent RS threads. [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 0 ( 0.0%) by mutator threads. [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Did 0 coarsenings. [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Concurrent RS threads times (s) [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Concurrent sampling threads times (s) [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 0.00 [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Current rem set statistics [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) Total per region rem sets sizes = 10206K. Max = 498K. [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 57200B ( 0.5%) by 5 Young regions [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 18384B ( 0.2%) by 2 Humongous regions [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 8186K ( 80.2%) by 912 Free regions [2020-01-17T17:46:58.516+0000][554.097s][18852] GC(65) 1945K ( 19.1%) by 105 Old regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Static structures = 520K, free_lists = 7548K. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 206 occupied cards represented. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 205 ( 99.5%) entries by 5 Young regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 1 ( 0.5%) entries by 2 Humongous regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 0 ( 0.0%) entries by 912 Free regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 0 ( 0.0%) entries by 105 Old regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Region with largest rem set = 7:(O)[0x0000000407000000,0x0000000408000000,0x0000000408000000], size = 498K, occupied = 0B. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Total heap region code root sets sizes = 1024K. Max = 489K. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 5816B ( 0.6%) by 5 Young regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 32B ( 0.0%) by 2 Humongous regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 14592B ( 1.4%) by 912 Free regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 1004K ( 98.1%) by 105 Old regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 37573 code roots represented. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 66 ( 0.2%) elements by 5 Young regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 0 ( 0.0%) elements by 2 Humongous regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 0 ( 0.0%) elements by 912 Free regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) 37507 ( 99.8%) elements by 105 Old regions [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Region with largest amount of code roots = 7:(O)[0x0000000407000000,0x0000000408000000,0x0000000408000000], size = 489K, num_elems = 20702. [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Pre Evacuate Collection Set: 0.0ms [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Prepare TLABs: 0.1ms [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Choose Collection Set: 0.0ms [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Humongous Register: 0.0ms [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Evacuate Collection Set: 26.0ms [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Ext Root Scanning (ms): Min: 0.7, Avg: 1.2, Max: 1.3, Diff: 0.6, Sum: 17.3, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Update RS (ms): Min: 0.8, Avg: 2.4, Max: 3.8, Diff: 3.0, Sum: 35.4, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Processed Buffers: Min: 4, Avg: 27.1, Max: 68, Diff: 64, Sum: 407, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Scanned Cards: Min: 503, Avg: 1842.7, Max: 3156, Diff: 2653, Sum: 27641, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Skipped Cards: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Scan RS (ms): Min: 2.9, Avg: 9.1, Max: 12.6, Diff: 9.6, Sum: 136.8, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Scanned Cards: Min: 557, Avg: 4460.7, Max: 7315, Diff: 6758, Sum: 66910, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Claimed Cards: Min: 1152, Avg: 8002.0, Max: 12274, Diff: 11122, Sum: 120030, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Skipped Cards: Min: 35295, Avg: 61708.5, Max: 70294, Diff: 34999, Sum: 925628, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.6, Diff: 0.6, Sum: 0.7, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) AOT Root Scanning (ms): skipped [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Object Copy (ms): Min: 9.0, Avg: 13.1, Max: 21.2, Diff: 12.2, Sum: 197.0, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 15, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9, Workers: 15 [2020-01-17T17:46:58.516+0000][554.098s][18852] GC(65) GC Worker Total (ms): Min: 26.0, Avg: 26.0, Max: 26.0, Diff: 0.0, Sum: 389.6, Workers: 15 [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Post Evacuate Collection Set: 1.9ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Code Roots Fixup: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Clear Card Table: 0.9ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Reference Processing: 0.3ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Weak Processing: 0.1ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Merge Per-Thread State: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Code Roots Purge: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Redirty Cards: 0.2ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) DerivedPointerTable Update: 0.2ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Free Collection Set: 0.4ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Humongous Reclaim: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Start New Collection Set: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Resize TLABs: 0.1ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Expand Heap After Collection: 0.0ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Other: 2.6ms [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Eden regions: 609->0(609) [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Survivor regions: 5->5(77) [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Old regions: 105->105 [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Humongous regions: 2->2 [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Metaspace: 144342K->144342K(1179648K) [2020-01-17T17:46:58.517+0000][554.098s][18852] GC(65) Pause Young (Normal) (G1 Evacuation Pause) 11526M->1789M(16384M) 30.664ms -------------- next part -------------- An HTML attachment was scrubbed... URL: From roy.sunny.zhang007 at gmail.com Mon Jan 20 10:22:43 2020 From: roy.sunny.zhang007 at gmail.com (Roy Zhang) Date: Mon, 20 Jan 2020 18:22:43 +0800 Subject: Abnormal high sys time in G1 GC Message-ID: Dear JVM experts, Recently we found GC spike (long STW minor GC), and sys time is high when we GC time is high. Normally sys time is near 0 seconds and minor GC is less than 500ms. From http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, high sys time could be caused by operation system problem/VM related problem/memory constraint/disk IO pressure/Transparent Huge Pages. I checked them one by one, don't find any clue, could u please kindly provide suggestion? Thanks in advance! 1.operation system problem --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, disk is enough), and there is no error in /var/log/dmesg 2. memory constraint -- We have enough available memory. available memory (free -m) is 263G 3. disk IO pressure -- Not find issue from disk info from prometheus node exporter. Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics (disk IO util and saturation metrics) which is part of iostat. It could be caused by big Granularity??? 4. VM related problem -- We are using physical machine 5. Transparent Huge Pages. It is madvise. It could be a problem, but we don't have this issue previously. It has been running for nearly 20 weeks. *cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never* *JDK version:* OpenJDK Runtime Environment, 1.8.0_222-b10 *Java Opts:* -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=xxxx -Dcom.sun.management.jmxremote.rmi.port=xxxx -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xloggc:/server/xxxx.log -XX:+PrintGCDateStamps -XX:AutoBoxCacheMax=1000000 -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=50 -XX:InitiatingHeapOccupancyPercent=70 -XX:+ParallelRefProcEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+UseStringDeduplication -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xms160g -Xmx160g -XX:+HeapDumpOnOutOfMemoryError *Snippet of GC log:* 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) (young), *6.2899024 secs*] [Parallel Time: 6255.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: 2756666.1, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, Sum: 16.8] [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: 4794.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 8.8] * [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: 6.4, Sum: 201358.4]* [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: 172.9] [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: 1475, Sum: 44650] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 7.0] [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: 1.2, Sum: 206383.7] [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 0.7 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] [Clear CT: 4.0 ms] [Other: 29.6 ms] [Choose CSet: 0.1 ms] [Ref Proc: 10.3 ms] [Ref Enq: 0.6 ms] [Redirty Cards: 11.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 6.5 ms] [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M Heap: 83520.0M(160.0G)->11046.9M(160.0G)] * [Times: user=27.19 sys=162.28, real=6.30 secs] * 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) (young) (initial-mark), *4.1248088 secs*] [Parallel Time: 4098.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: 1096883.2, Diff: 1.2] [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, Sum: 159.7] [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] * [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, Sum: 66401.0]* [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, Sum: 18.5] * [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: 40.2, Sum: 67623.1]* [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: 978.0] [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: 8067, Sum: 217372] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 7.9] [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: 1.4, Sum: 135197.8] [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.2 ms] [String Dedup Fixup: 1.0 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] [Clear CT: 3.4 ms] [Other: 21.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 9.1 ms] [Ref Enq: 0.9 ms] [Redirty Cards: 4.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 5.3 ms] [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M Heap: 83508.0M(160.0G)->10944.0M(160.0G)] * [Times: user=68.40 sys=9.11, real=4.13 secs] * Thanks, Roy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Mon Jan 20 10:51:55 2020 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Mon, 20 Jan 2020 10:51:55 +0000 Subject: Abnormal high sys time in G1 GC In-Reply-To: References: Message-ID: Hello, Can you say a bit what's the actual problem? Your smys times in those GCs look large, but it's only a 4s Pause for a 160gb heap. I am sure you saw that pause time before? I think it's pretty hard to tell after the fact, but with such a large system I would lean towards problems outside of the JVM. Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-use im Auftrag von Roy Zhang Gesendet: Monday, January 20, 2020 11:22:43 AM An: hotspot-gc-use at openjdk.java.net Betreff: Abnormal high sys time in G1 GC Dear JVM experts, Recently we found GC spike (long STW minor GC), and sys time is high when we GC time is high. Normally sys time is near 0 seconds and minor GC is less than 500ms. >From http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, high sys time could be caused by operation system problem/VM related problem/memory constraint/disk IO pressure/Transparent Huge Pages. I checked them one by one, don't find any clue, could u please kindly provide suggestion? Thanks in advance! 1.operation system problem --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, disk is enough), and there is no error in /var/log/dmesg 2. memory constraint -- We have enough available memory. available memory (free -m) is 263G 3. disk IO pressure -- Not find issue from disk info from prometheus node exporter. Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics (disk IO util and saturation metrics) which is part of iostat. It could be caused by big Granularity??? 4. VM related problem -- We are using physical machine 5. Transparent Huge Pages. It is madvise. It could be a problem, but we don't have this issue previously. It has been running for nearly 20 weeks. cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never JDK version: OpenJDK Runtime Environment, 1.8.0_222-b10 Java Opts: -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=xxxx -Dcom.sun.management.jmxremote.rmi.port=xxxx -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xloggc:/server/xxxx.log -XX:+PrintGCDateStamps -XX:AutoBoxCacheMax=1000000 -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=50 -XX:InitiatingHeapOccupancyPercent=70 -XX:+ParallelRefProcEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+UseStringDeduplication -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xms160g -Xmx160g -XX:+HeapDumpOnOutOfMemoryError Snippet of GC log: 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) (young), 6.2899024 secs] [Parallel Time: 6255.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: 2756666.1, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, Sum: 16.8] [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: 4794.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 8.8] [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: 6.4, Sum: 201358.4] [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: 172.9] [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: 1475, Sum: 44650] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 7.0] [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: 1.2, Sum: 206383.7] [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 0.7 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] [Clear CT: 4.0 ms] [Other: 29.6 ms] [Choose CSet: 0.1 ms] [Ref Proc: 10.3 ms] [Ref Enq: 0.6 ms] [Redirty Cards: 11.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 6.5 ms] [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M Heap: 83520.0M(160.0G)->11046.9M(160.0G)] [Times: user=27.19 sys=162.28, real=6.30 secs] 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 4.1248088 secs] [Parallel Time: 4098.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: 1096883.2, Diff: 1.2] [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, Sum: 159.7] [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, Sum: 66401.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, Sum: 18.5] [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: 40.2, Sum: 67623.1] [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: 978.0] [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: 8067, Sum: 217372] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 7.9] [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: 1.4, Sum: 135197.8] [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.2 ms] [String Dedup Fixup: 1.0 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] [Clear CT: 3.4 ms] [Other: 21.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 9.1 ms] [Ref Enq: 0.9 ms] [Redirty Cards: 4.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 5.3 ms] [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M Heap: 83508.0M(160.0G)->10944.0M(160.0G)] [Times: user=68.40 sys=9.11, real=4.13 secs] Thanks, Roy -------------- next part -------------- An HTML attachment was scrubbed... URL: From roy.sunny.zhang007 at gmail.com Mon Jan 20 10:54:37 2020 From: roy.sunny.zhang007 at gmail.com (Roy Zhang) Date: Mon, 20 Jan 2020 18:54:37 +0800 Subject: Abnormal high sys time in G1 GC In-Reply-To: References: Message-ID: Sent to hotspot-gc-dev mail list as well :) Thank you for ur help in advance!!! Thanks, Roy On Mon, Jan 20, 2020 at 6:22 PM Roy Zhang wrote: > Dear JVM experts, > > Recently we found GC spike (long STW minor GC), and sys time is high when > we GC time is high. Normally sys time is near 0 seconds and minor GC is > less than 500ms. > > From > http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html > and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, > high sys time could be caused by operation system problem/VM related > problem/memory constraint/disk IO pressure/Transparent Huge Pages. > > I checked them one by one, don't find any clue, could u please kindly > provide suggestion? Thanks in advance! > > 1.operation system problem > --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, > disk is enough), and there is no error in /var/log/dmesg > 2. memory constraint > -- We have enough available memory. available memory (free -m) is 263G > 3. disk IO pressure > -- Not find issue from disk info from prometheus node exporter. > Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics > (disk IO util and saturation metrics) which is part of iostat. It could be > caused by big Granularity??? > 4. VM related problem > -- We are using physical machine > 5. Transparent Huge Pages. > It is madvise. It could be a problem, but we don't have this issue > previously. It has been running for nearly 20 weeks. > > *cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never* > > *JDK version:* > OpenJDK Runtime Environment, 1.8.0_222-b10 > > *Java Opts:* > -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml > > -server > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=xxxx > -Dcom.sun.management.jmxremote.rmi.port=xxxx > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Xloggc:/server/xxxx.log > -XX:+PrintGCDateStamps > -XX:AutoBoxCacheMax=1000000 > -XX:+UseG1GC > -XX:MaxGCPauseMillis=500 > -XX:+UnlockExperimentalVMOptions > -XX:G1NewSizePercent=50 > -XX:InitiatingHeapOccupancyPercent=70 > -XX:+ParallelRefProcEnabled > -XX:+ExplicitGCInvokesConcurrent > -XX:+UseStringDeduplication > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -Xms160g > -Xmx160g > -XX:+HeapDumpOnOutOfMemoryError > > *Snippet of GC log:* > > 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) > (young), *6.2899024 secs*] > [Parallel Time: 6255.0 ms, GC Workers: 33] > [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: > 2756666.1, Diff: 1.2] > [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, > Sum: 16.8] > [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] > [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] > [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: > 4794.1] > [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, > Sum: 8.8] > * [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: > 6.4, Sum: 201358.4]* > [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: > 172.9] > [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: > 1475, Sum: 44650] > [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: > 7.0] > [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: > 1.2, Sum: 206383.7] > [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, > Diff: 0.4] > [Code Root Fixup: 0.6 ms] > [Code Root Purge: 0.0 ms] > [String Dedup Fixup: 0.7 ms, GC Workers: 33] > [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] > [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] > [Clear CT: 4.0 ms] > [Other: 29.6 ms] > [Choose CSet: 0.1 ms] > [Ref Proc: 10.3 ms] > [Ref Enq: 0.6 ms] > [Redirty Cards: 11.3 ms] > [Humongous Register: 0.2 ms] > [Humongous Reclaim: 0.0 ms] > [Free CSet: 6.5 ms] > [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M > Heap: 83520.0M(160.0G)->11046.9M(160.0G)] > * [Times: user=27.19 sys=162.28, real=6.30 secs] * > > 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) > (young) (initial-mark), *4.1248088 secs*] > [Parallel Time: 4098.0 ms, GC Workers: 33] > [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: > 1096883.2, Diff: 1.2] > [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, > Sum: 159.7] > [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] > [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] > * [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, > Sum: 66401.0]* > [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, > Sum: 18.5] > * [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: > 40.2, Sum: 67623.1]* > [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: > 978.0] > [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: > 8067, Sum: 217372] > [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: > 7.9] > [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: > 1.4, Sum: 135197.8] > [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, > Diff: 0.4] > [Code Root Fixup: 0.6 ms] > [Code Root Purge: 0.2 ms] > [String Dedup Fixup: 1.0 ms, GC Workers: 33] > [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] > [Clear CT: 3.4 ms] > [Other: 21.7 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 9.1 ms] > [Ref Enq: 0.9 ms] > [Redirty Cards: 4.3 ms] > [Humongous Register: 0.2 ms] > [Humongous Reclaim: 0.0 ms] > [Free CSet: 5.3 ms] > [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M > Heap: 83508.0M(160.0G)->10944.0M(160.0G)] > > * [Times: user=68.40 sys=9.11, real=4.13 secs] * > > Thanks, > Roy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Mon Jan 20 11:30:55 2020 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Mon, 20 Jan 2020 11:30:55 +0000 Subject: Abnormal high sys time in G1 GC In-Reply-To: References: Message-ID: Hello, Can you say a bit what's the actual problem? Your sys times in those GCs look large, but it's only a 4s Pause for a 160gb heap. I am sure you saw that pause time before? I think it's pretty hard to tell after the fact, but with such a large system I would lean towards problems outside of the JVM. Do you have a long term graph from the logs? Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-use im Auftrag von Roy Zhang Gesendet: Monday, January 20, 2020 11:22:43 AM An: hotspot-gc-use at openjdk.java.net Betreff: Abnormal high sys time in G1 GC Dear JVM experts, Recently we found GC spike (long STW minor GC), and sys time is high when we GC time is high. Normally sys time is near 0 seconds and minor GC is less than 500ms. >From http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, high sys time could be caused by operation system problem/VM related problem/memory constraint/disk IO pressure/Transparent Huge Pages. I checked them one by one, don't find any clue, could u please kindly provide suggestion? Thanks in advance! 1.operation system problem --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, disk is enough), and there is no error in /var/log/dmesg 2. memory constraint -- We have enough available memory. available memory (free -m) is 263G 3. disk IO pressure -- Not find issue from disk info from prometheus node exporter. Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics (disk IO util and saturation metrics) which is part of iostat. It could be caused by big Granularity??? 4. VM related problem -- We are using physical machine 5. Transparent Huge Pages. It is madvise. It could be a problem, but we don't have this issue previously. It has been running for nearly 20 weeks. cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never JDK version: OpenJDK Runtime Environment, 1.8.0_222-b10 Java Opts: -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=xxxx -Dcom.sun.management.jmxremote.rmi.port=xxxx -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xloggc:/server/xxxx.log -XX:+PrintGCDateStamps -XX:AutoBoxCacheMax=1000000 -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=50 -XX:InitiatingHeapOccupancyPercent=70 -XX:+ParallelRefProcEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+UseStringDeduplication -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xms160g -Xmx160g -XX:+HeapDumpOnOutOfMemoryError Snippet of GC log: 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) (young),6.2899024 secs] [Parallel Time: 6255.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: 2756666.1, Diff: 1.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, Sum: 16.8] [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: 4794.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, Sum: 8.8] [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: 6.4, Sum: 201358.4] [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: 172.9] [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: 1475, Sum: 44650] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 7.0] [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: 1.2, Sum: 206383.7] [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 0.7 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] [Clear CT: 4.0 ms] [Other: 29.6 ms] [Choose CSet: 0.1 ms] [Ref Proc: 10.3 ms] [Ref Enq: 0.6 ms] [Redirty Cards: 11.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 6.5 ms] [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M Heap: 83520.0M(160.0G)->11046.9M(160.0G)] [Times: user=27.19 sys=162.28, real=6.30 secs] 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) (young) (initial-mark),4.1248088 secs] [Parallel Time: 4098.0 ms, GC Workers: 33] [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: 1096883.2, Diff: 1.2] [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, Sum: 159.7] [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, Sum: 66401.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, Sum: 18.5] [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: 40.2, Sum: 67623.1] [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: 978.0] [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: 8067, Sum: 217372] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 7.9] [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: 1.4, Sum: 135197.8] [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, Diff: 0.4] [Code Root Fixup: 0.6 ms] [Code Root Purge: 0.2 ms] [String Dedup Fixup: 1.0 ms, GC Workers: 33] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] [Clear CT: 3.4 ms] [Other: 21.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 9.1 ms] [Ref Enq: 0.9 ms] [Redirty Cards: 4.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 5.3 ms] [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M Heap: 83508.0M(160.0G)->10944.0M(160.0G)] [Times: user=68.40 sys=9.11, real=4.13 secs] Thanks, Roy -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Jan 20 14:17:16 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jan 2020 15:17:16 +0100 Subject: Increased ScanRS time when decreasing G1RSetUpdatingPauseTimePercent In-Reply-To: References: Message-ID: <20dd9cda-b3ff-d6a2-ca5d-7732cd9648e8@oracle.com> Hi Joakim, On 19.01.20 12:02, Joakim Thun wrote: > Hi all, > > I would really appreciate some help understanding a G1 behaviour I am > seeing when decreasing the value of G1RSetUpdatingPauseTimePercent where > the goal is to decrease the time spent in the UpdateRS phase by moving > some of the work to be processed concurrently by the refinement threads. > > The behaviour I was expecting to see was a decrease in UpdateRS time > which I am seeing but at the expense of more time being spent in the > ScanRS phase so the end result i.e. the total pause time end up being > very similar with and without the flag set. Decreasing > G1RSetUpdatingPauseTimePercent to both 5 and 1 results in similar > behaviour. I noticed that the number of scanned cards is much higher in > the ScanRS phase when decreasing G1RSetUpdatingPauseTimePercent. > > Is this expected behaviour? > TLDR: yes. Longer version: The refinement threads and the refinement queues (which are processed during Update RS) purpose is to update the remembered sets (attributed in the Scan RS time) after some filtering (is that card already in a remembered set? Can we drop it for other reasons?) If an entry/card in the refinement queues has not been processed before GC, it must be during GC (not the entire filtering needs to be applied there). What is cheaper to do during GC, scanning remembered sets or refinement queues? Depends on the contents of the card. If it contains references to a lot of regions in the collection set, then it is probably cheaper to let it stay in the refinement queue. If it does not contain a reference to any region in the collection set, then putting it into the remembered sets it's a win because we moved otherwise unnecessary work out of the pause. There are a lot of different arguments about what the optimal location for a card should be; some of these decisions have impact outside of the gc pause too. E.g. a card in the refinement queue not yet processed is never re-enqueued - this saves enqueuing and processing work at mutator time; however, given that they may not contain cards that are in the collection set (which you know if you process them), keeping them would make pause slightly time longer. As long as the card in the refinement buffer contains a reference to the collection set, G1 would scan it anyway (it would be in some remembered set), and retrieving values from the refinement queue during gc is (very slightly) faster than from the remembered sets. Overall there is no rule that "Update RS" work is bad while "Scan RS" isn't. In your case, since you are trading Update RS with Scan RS time, I would argue that it's better to have the cards in the refinement queue. > Are there any other flags worth considering to improve the ScanRS time > while moving more work to the refinement threads? One could try to manually control refinement work by manually setting the various thresholds. No guarantees that this improves your situation. Logging "gc+ergo+refine=debug" may help with debugging the adaptive refinement thresholds; gc+remset=trace gives some general information about concurrent refinement. Some rundown on the options: G1UseAdaptiveConcRefinement: enable adaptive refinement, ie. try to observe G1UpdatePauseTimePercent. G1UpdateBufferSize (default 256): size of a buffer in the refinement queue, i.e. individual threads will cache that amount of cards to process later until they are made available to the refinement threads. G1ConcRefinementGreenZone, G1ConcRefinementYellowZone, G1ConcRefinementRedZone: some thresholds that control refinement threads. If the number of buffers (see above) is lower than the green threshold, there is no concurrent refinement activity. From green to yellow threshold increasingly more concurrent refinement threads will be used. If the threshold reaches red, mutator threads will do the work. If G1UseAdaptiveConcRefinement is enabled, the thresholds are changed adaptively, and the ones you give on the command line are initial values. Otherwise the thresholds are fixed. G1ConcGCThreads: max number of refinement threads. So you could completely disable concurrent refinement by disabling G1UseAdaptiveConcRefinement, and setting G1ConcGCThreads=0; this will make the mutators do all the work immediately if you set the red threshold to 0 too. If you set the G1UpdateBufferSize to 1 too, the mutators will immediately do all work I think (this will likely have a significant impact on mutator performance). Otherwise, using the thresholds, you can, in a very granular way select the amount of concurrent refinement work. Thanks, Thomas From thomas.schatzl at oracle.com Mon Jan 20 14:30:18 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jan 2020 15:30:18 +0100 Subject: Abnormal high sys time in G1 GC In-Reply-To: References: Message-ID: <45623d67-ef0e-161e-f446-0fd89b3e6f1b@oracle.com> Hi, On 20.01.20 11:51, Bernd Eckenfels wrote: > Hello, > > Can you say a bit what's the actual problem? Your smys times in those > GCs look large, but it's only a 4s Pause for a 160gb heap. I am sure you In my experience, 4s pause for a 160gb heap is rather unusual and unexpected particularly on a young-only collection. Even on jdk8 (later jvms are much better), but it certainly depends on application. > saw that pause time before? I think it's pretty hard to tell after the > fact, but with such a large system I would lean towards problems outside > of the JVM. According to https://access.redhat.com/solutions/46111 you can get statistics about THP activity looking at /proc//vmstat. This looks just like one of these one-off occurrences THP could cause. Thanks, Thomas From roy.sunny.zhang007 at gmail.com Thu Jan 30 15:51:55 2020 From: roy.sunny.zhang007 at gmail.com (Roy Zhang) Date: Thu, 30 Jan 2020 23:51:55 +0800 Subject: When will G1 GC trigger initial-mark besides IHOP Message-ID: Dear JVM experts, I set -XX:InitiatingHeapOccupancyPercent=70 in JDK8 (no adaptive IHOP feature), but I found there are two initial-mark phase at the beginning of JVM start when HeapOccupancyPercent is far less than 70%, is there any other factor which will trigger G1 GC initial mark phase? Thanks in advance! *Excerpt of GC log:* 2020-01-22T03:58:14.227+0000: 3.158: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.1583711 secs] [Eden: 1056.0M(81920.0M)->0.0B(81184.0M) Survivors: 0.0B->736.0M Heap: 1472.0M(160.0G)->1179.5M(160.0G)] 2020-01-22T04:13:07.073+0000: 896.004: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 3.8512514 secs] [Eden: 81184.0M(81184.0M)->0.0B(71904.0M) Survivors: 736.0M->10016.0M Heap: 83643.5M(160.0G)->11744.0M(160.0G)] *JDK version: * openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) Thanks, Roy -------------- next part -------------- An HTML attachment was scrubbed... URL: From roy.sunny.zhang007 at gmail.com Thu Jan 30 16:05:12 2020 From: roy.sunny.zhang007 at gmail.com (Roy Zhang) Date: Fri, 31 Jan 2020 00:05:12 +0800 Subject: When will G1 GC allocate objects in old generation directly Message-ID: Dear JVM experts, In my gc log, there are many objects are allocated to old generations (ALLOC(Old) logs), current threshold is 15, my objects age is only 1, i.e, these objects will not be promoted to old generations, i guess is there any condition when objects will be allocated to old generations directly in G1 GC? Thanks in advance! *Excerpt of GC log:* grep "ALLOC(Old)" gc.log | wc -l 387 grep "thres" gc.log Desired survivor size 1207959552 bytes, new threshold 15 (max 15) Desired survivor size 1207959552 bytes, new threshold 15 (max 15) - age 1: 37707272 bytes, 37707272 total *JDK version: * openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) Thanks, Roy -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Thu Jan 30 16:35:55 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 17:35:55 +0100 Subject: When will G1 GC trigger initial-mark besides IHOP In-Reply-To: References: Message-ID: <11bf4944-48bf-c9ac-fee1-7b5da5464846@oracle.com> Hi, On 30.01.20 16:51, Roy Zhang wrote: > Dear JVM experts, > > I set?-XX:InitiatingHeapOccupancyPercent=70 in JDK8 (no adaptive IHOP > feature), but I found?there are two initial-mark phase at the beginning > of JVM start when?HeapOccupancyPercent is far less than 70%, is there > any other factor which will trigger G1 GC initial mark phase? Thanks in > advance! > > *Excerpt of GC log:* > 2020-01-22T03:58:14.227+0000: 3.158: [GC pause (Metadata GC Threshold) > (young) (initial-mark), 0.1583711 secs] > [Eden: 1056.0M(81920.0M)->0.0B(81184.0M) Survivors: 0.0B->736.0M Heap: > 1472.0M(160.0G)->1179.5M(160.0G)] > 2020-01-22T04:13:07.073+0000: 896.004: [GC pause (G1 Evacuation Pause) > (young) (initial-mark), 3.8512514 secs] > [Eden: 81184.0M(81184.0M)->0.0B(71904.0M) Survivors: 736.0M->10016.0M > Heap: 83643.5M(160.0G)->11744.0M(160.0G)] > the log message before the "initial mark" gc tells you - when the Metaspace is about to get full to try to unload data in it (e.g. Class data). I.e. the condition is either: - Java heap gets too full - Metaspace gets too full Thanks, Thomas From thomas.schatzl at oracle.com Thu Jan 30 16:39:06 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 17:39:06 +0100 Subject: When will G1 GC allocate objects in old generation directly In-Reply-To: References: Message-ID: <7550d450-df34-19d5-eda5-e6c398418fb0@oracle.com> Hi, On 30.01.20 17:05, Roy Zhang wrote: > Dear JVM experts, > > In my gc log, there are many objects are allocated to old generations > (ALLOC(Old) logs), current threshold is 15, my objects age is only 1, > i.e, these objects will not be promoted to old generations, i guess is > there any condition when objects will be allocated to old generations > directly in G1 GC? Thanks in advance! > > *Excerpt of GC log:* > > grep "ALLOC(Old)" gc.log | wc -l > 387 > grep "thres" gc.log > Desired survivor size 1207959552 bytes, new threshold 15 (max 15) > Desired survivor size 1207959552 bytes, new threshold 15 (max 15) > - age ? 1: ? 37707272 bytes, ? 37707272 total > > *JDK version: * > openjdk version "1.8.0_222" > OpenJDK Runtime Environment (build 1.8.0_222-b10) > OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) > G1 only allocates humongous (large) objects directly in old gen, or during young gc if the object has an age > the current threshold, or if the survivor space is full. (Or during evacuation failure, i.e. not enough space to copy young gen objects somewhere else, entire regions including their live content are relabelled as old gen, which is similar to "allocate" objects in old gen. From the log snippet you can't tell what happens here. Thanks, Thomas From ecki at zusammenkunft.net Thu Jan 30 19:47:07 2020 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Thu, 30 Jan 2020 19:47:07 +0000 Subject: When will G1 GC trigger initial-mark besides IHOP In-Reply-To: <11bf4944-48bf-c9ac-fee1-7b5da5464846@oracle.com> References: , <11bf4944-48bf-c9ac-fee1-7b5da5464846@oracle.com> Message-ID: Hello Roy, here you find something about the G1 Evacuation Pause (full young mode) https://plumbr.io/handbook/garbage-collection-algorithms-implementations/g1/evacuation-pause-fully-young This in addition to the meta space resizing Thomas described. Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-use im Auftrag von Thomas Schatzl Gesendet: Donnerstag, Januar 30, 2020 5:36 PM An: hotspot-gc-use at openjdk.java.net Betreff: Re: When will G1 GC trigger initial-mark besides IHOP Hi, On 30.01.20 16:51, Roy Zhang wrote: > Dear JVM experts, > > I set -XX:InitiatingHeapOccupancyPercent=70 in JDK8 (no adaptive IHOP > feature), but I found there are two initial-mark phase at the beginning > of JVM start when HeapOccupancyPercent is far less than 70%, is there > any other factor which will trigger G1 GC initial mark phase? Thanks in > advance! > > *Excerpt of GC log:* > 2020-01-22T03:58:14.227+0000: 3.158: [GC pause (Metadata GC Threshold) > (young) (initial-mark), 0.1583711 secs] > [Eden: 1056.0M(81920.0M)->0.0B(81184.0M) Survivors: 0.0B->736.0M Heap: > 1472.0M(160.0G)->1179.5M(160.0G)] > 2020-01-22T04:13:07.073+0000: 896.004: [GC pause (G1 Evacuation Pause) > (young) (initial-mark), 3.8512514 secs] > [Eden: 81184.0M(81184.0M)->0.0B(71904.0M) Survivors: 736.0M->10016.0M > Heap: 83643.5M(160.0G)->11744.0M(160.0G)] > the log message before the "initial mark" gc tells you - when the Metaspace is about to get full to try to unload data in it (e.g. Class data). I.e. the condition is either: - Java heap gets too full - Metaspace gets too full Thanks, Thomas _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: