Feedback from experiments on production application

Thu May 31 22:55:00 UTC 2018

Hello Petr,

thank you so much for trying this out!

Can you quantify throughput and/or latency? How does it relate to, e.g.,
G1, CMS or parallel GC? Is throughput any different between traversal
and adaptive (e.g. default heuristics)?

The stats show that you have at least one ~430ms pause, that is caused
by out-of-memory followed by degenerated GC. Any idea what is going on
there? It should be possible to find the corresponding cycle in the log.

BTW, we disabled concurrent class unloading in the meantime. Classes
will still get unloaded at full-gc, if that every happens (either
triggered by out-of-memory or by System.gc() ). This seems to be the
better compromise until we implement concurrent class loader data
scanning and don't hurt pause times by this. Users who need/want
concurrent class unloading can still turn it on by passing
-XX:+ClassUnloadingWithConcurrentMark in the cmd line.

Thanks again, Roman

> Hi everyone,
> 
> sorry for the late response. I wanted to provide JDK11 results in the reply, but migration to JDK11 was much harder nut to crack than expected (many major libraries are not JDK11 ready yet and very sad about the changes in Unsafe ;)).
> 
> I used build 11-internal+0-nightly-sobornost-builds.shipilev.net-shenandoah-jdkX-b109
> 
> I found behaviour with heuristic=tranversal without class unloading quite nice. It gave me nice throughput with occasional 6-8ms pause. Given that the default heuristics would give 6-8ms Final mark pause anyway, there was no increase in max pause with “traversal”. It definitely solved the "pause spacing” problem I experienced with the default heuristics.
> With class unloading turned on, the final traversal pause increased by 10ms to 16-18ms.
> 
> I did not register any difference in behaviour from JDK8 when using the default heuristic.
> 
> There are extracts of typical GC cycles from the logs, in case you find them useful:
> 
> 1) “traversal” without class unloading:
> 
> ===================
> [847.319s][info ][gc,ergo              ] Free set: Used: 18313M of 20155M, Regions: 249 mutator, 0 collector
> [847.319s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 1515520K, External frag: 20%, Internal frag: 7%
> [847.319s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [847.326s][info ][gc                   ] GC(40) Pause Init Traversal 5.109ms
> [847.326s][info ][gc,start             ] GC(40) Concurrent traversal
> [847.405s][debug][gc,ref,start         ] GC(40) Preclean SoftReferences
> [847.405s][debug][gc,ref               ] GC(40) Preclean SoftReferences 0.097ms
> [847.405s][debug][gc,ref,start         ] GC(40) Preclean WeakReferences
> [847.408s][debug][gc,ref               ] GC(40) Preclean WeakReferences 3.010ms
> [847.408s][debug][gc,ref,start         ] GC(40) Preclean FinalReferences
> [847.408s][debug][gc,ref               ] GC(40) Preclean FinalReferences 0.141ms
> [847.408s][debug][gc,ref,start         ] GC(40) Preclean PhantomReferences
> [847.408s][debug][gc,ref               ] GC(40) Preclean PhantomReferences 0.478ms
> [847.408s][info ][gc                   ] GC(40) Concurrent traversal 18638M->18781M(20480M) 82.012ms
> [847.410s][info ][gc,start             ] GC(40) Pause Final Traversal
> [847.416s][debug][gc,phases,ref        ] GC(40) Reference Processing: 0.2ms
> [847.416s][debug][gc,phases,ref        ] GC(40)   SoftReference: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Balance queues: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase1: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase2: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase3: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Discovered: 0
> [847.416s][debug][gc,phases,ref        ] GC(40)     Cleared: 0
> [847.416s][debug][gc,phases,ref        ] GC(40)   WeakReference: 0.2ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Balance queues: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase2: 0.2ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase3: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Discovered: 20
> [847.416s][debug][gc,phases,ref        ] GC(40)     Cleared: 20
> [847.416s][debug][gc,phases,ref        ] GC(40)   FinalReference: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Balance queues: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase2: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase3: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Discovered: 0
> [847.416s][debug][gc,phases,ref        ] GC(40)     Cleared: 0
> [847.416s][debug][gc,phases,ref        ] GC(40)   PhantomReference: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Balance queues: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase2: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Phase3: 0.0ms
> [847.416s][debug][gc,phases,ref        ] GC(40)     Discovered: 0
> [847.416s][debug][gc,phases,ref        ] GC(40)     Cleared: 0
> [847.416s][debug][gc,phases,ref        ] GC(40) Reference Enqueuing 0.1ms
> [847.416s][debug][gc,phases,ref        ] GC(40)   Reference Counts:  Soft: 0  Weak: 0  Final: 0  Phantom: 0
> [847.416s][info ][gc,ergo              ] GC(40) Free: 20161M, Regions: 2538 mutator, 0 collector
> [847.417s][info ][gc                   ] GC(40) Pause Final Traversal 6.247ms
> [847.417s][info ][gc,start             ] GC(40) Concurrent cleanup
> [847.418s][info ][gc                   ] GC(40) Concurrent cleanup 18781M->351M(20480M) 1.547ms
> [847.419s][info ][gc,ergo              ] Free set: Used: 47M of 20161M, Regions: 2533 mutator, 0 collector
> [847.419s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 18702336K, External frag: 10%, Internal frag: 0%
> [847.419s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [847.419s][info ][gc,ergo              ] Pacer for Idle. Initial: 409M, Alloc Tax Rate: 1.0x
> ===================
> 
> 2) “traversal” with class unloading:
> 
> ===================
> [197.504s][info ][gc,ergo              ] Free set: Used: 18396M of 20237M, Regions: 241 mutator, 0 collector
> [197.505s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 1687552K, External frag: 11%, Internal frag: 4%
> [197.505s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [197.507s][info ][gc,start             ] GC(9) Pause Init Traversal
> [197.507s][info ][gc,ergo              ] GC(9) Free: 1838M, Regions: 240 mutator, 0 collector
> [197.507s][info ][gc,ergo              ] GC(9) Got 2315 collection set regions and 0 root set regions
> [197.508s][info ][gc,ergo              ] GC(9) Pacer for Traversal. Used: 18637M, Free: 1838M, Non-Taxable: 183M, Alloc Tax Rate: 12.4x
> [197.508s][info ][gc                   ] GC(9) Pause Init Traversal 1.577ms
> [197.508s][info ][gc,start             ] GC(9) Concurrent traversal
> [197.584s][debug][gc,ref,start         ] GC(9) Preclean SoftReferences
> [197.584s][debug][gc,ref               ] GC(9) Preclean SoftReferences 0.100ms
> [197.584s][debug][gc,ref,start         ] GC(9) Preclean WeakReferences
> [197.587s][debug][gc,ref               ] GC(9) Preclean WeakReferences 2.513ms
> [197.587s][debug][gc,ref,start         ] GC(9) Preclean FinalReferences
> [197.587s][debug][gc,ref               ] GC(9) Preclean FinalReferences 0.102ms
> [197.587s][debug][gc,ref,start         ] GC(9) Preclean PhantomReferences
> [197.588s][debug][gc,ref               ] GC(9) Preclean PhantomReferences 0.659ms
> [197.588s][info ][gc                   ] GC(9) Concurrent traversal 18638M->18779M(20480M) 79.340ms
> [197.590s][info ][gc,start             ] GC(9) Pause Final Traversal
> [197.592s][debug][gc,phases,ref        ] GC(9) Reference Processing: 0.2ms
> [197.592s][debug][gc,phases,ref        ] GC(9)   SoftReference: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Balance queues: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase1: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase2: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase3: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Discovered: 0
> [197.592s][debug][gc,phases,ref        ] GC(9)     Cleared: 0
> [197.592s][debug][gc,phases,ref        ] GC(9)   WeakReference: 0.2ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Balance queues: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase2: 0.2ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase3: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Discovered: 20
> [197.592s][debug][gc,phases,ref        ] GC(9)     Cleared: 20
> [197.592s][debug][gc,phases,ref        ] GC(9)   FinalReference: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Balance queues: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase2: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase3: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Discovered: 0
> [197.592s][debug][gc,phases,ref        ] GC(9)     Cleared: 0
> [197.592s][debug][gc,phases,ref        ] GC(9)   PhantomReference: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Balance queues: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase2: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Phase3: 0.0ms
> [197.592s][debug][gc,phases,ref        ] GC(9)     Discovered: 0
> [197.592s][debug][gc,phases,ref        ] GC(9)     Cleared: 0
> [197.592s][debug][gc,phases,ref        ] GC(9) Reference Enqueuing 0.1ms
> [197.592s][debug][gc,phases,ref        ] GC(9)   Reference Counts:  Soft: 0  Weak: 0  Final: 0  Phantom: 0
> [197.592s][debug][gc,phases,start      ] GC(9) SystemDictionary WeakHandle cleaning
> [197.592s][debug][gc,phases            ] GC(9) SystemDictionary WeakHandle cleaning 0.103ms
> [197.592s][debug][gc,phases,start      ] GC(9) ClassLoaderData
> [197.593s][debug][gc,phases            ] GC(9) ClassLoaderData 0.377ms
> [197.593s][debug][gc,phases,start      ] GC(9) ProtectionDomainCacheTable
> [197.593s][debug][gc,phases            ] GC(9) ProtectionDomainCacheTable 0.017ms
> [197.602s][info ][gc,stringtable       ] GC(9) Cleaned string and symbol table, strings: 49649 processed, 0 removed, symbols: 261650 processed, 0 removed
> [197.606s][info ][gc,ergo              ] GC(9) Free: 20221M, Regions: 2539 mutator, 0 collector
> [197.607s][info ][gc                   ] GC(9) Pause Final Traversal 16.951ms
> [197.607s][info ][gc,start             ] GC(9) Concurrent cleanup
> [197.609s][info ][gc                   ] GC(9) Concurrent cleanup 18780M->287M(20480M) 1.964ms
> [197.609s][info ][gc,ergo              ] Free set: Used: 28M of 20221M, Regions: 2536 mutator, 0 collector
> [197.609s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 18800640K, External frag: 10%, Internal frag: 0%
> [197.609s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [197.609s][info ][gc,ergo              ] Pacer for Idle. Initial: 409M, Alloc Tax Rate: 1.0x
> ===================
> 
> 3) default heuristic (without class unloading)
> 
> ==================
> [237.113s][info ][gc,ergo              ] Concurrent marking triggered. Free: 3070M, Free Threshold: 3072M; Allocated: 15906M, Alloc Threshold: 0M
> [237.113s][info ][gc,ergo              ] Free set: Used: 15745M of 18814M, Regions: 393 mutator, 0 collector
> [237.113s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 2850816K, External frag: 10%, Internal frag: 2%
> [237.113s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [237.114s][info ][gc,start             ] GC(13) Pause Init Mark
> [237.114s][debug][gc,tlab              ] GC(13) TLAB totals: thrds: 113  refills: 16041 max: 518 slow allocs: 12 max 12 waste:  0.1% gc: 57826960B max: 1048536B slow: 12558848B max: 418592B fast: 0B max: 0B
> [237.118s][info ][gc,ergo              ] GC(13) Pacer for Mark. Used: 17409M, Free: 3069M, Non-Taxable: 306M, Alloc Tax Rate: 20.8x
> [237.118s][info ][gc                   ] GC(13) Pause Init Mark 3.605ms
> [237.118s][info ][gc,start             ] GC(13) Concurrent marking
> [237.212s][info ][gc                   ] GC(13) Concurrent marking 17409M->17549M(20480M) 94.486ms
> [237.214s][info ][gc,start             ] GC(13) Pause Final Mark
> [237.218s][info ][gc,ergo              ] GC(13) Adaptive CSet Selection. Target Free: 4096M, Actual Free: 18553M, Target CSet: [0M, 13915M]
> [237.218s][info ][gc,ergo              ] GC(13) Collectable Garbage: 1661M (9% of total), 2M CSet, 208 CSet regions
> [237.218s][info ][gc,ergo              ] GC(13) Immediate Garbage: 15649M (90% of total), 1957 regions
> [237.218s][info ][gc,ergo              ] GC(13) Free: 18579M, Regions: 2332 mutator, 0 collector
> [237.221s][info ][gc,ergo              ] GC(13) Pacer for Evacuation. CSet: 2M, Free: 18577M, Non-Taxable: 1857M, Alloc Tax Rate: 1.1x
> [237.221s][info ][gc                   ] GC(13) Pause Final Mark 7.270ms
> [237.221s][info ][gc,start             ] GC(13) Concurrent cleanup
> [237.222s][info ][gc                   ] GC(13) Concurrent cleanup 17551M->1925M(20480M) 0.676ms
> [237.222s][info ][gc,start             ] GC(13) Concurrent evacuation
> [237.230s][info ][gc                   ] GC(13) Concurrent evacuation 1925M->1992M(20480M) 8.243ms
> [237.232s][info ][gc,start             ] GC(13) Pause Init Update Refs
> [237.232s][debug][gc,plab              ] GC(13) Shenandoah mutator GCLAB stats PLAB allocation: allocated: 60088B, wasted: 552B, unused: 42664B, used: 16872B, undo waste: 0B, 
> [237.232s][debug][gc,plab              ] GC(13) Shenandoah mutator GCLAB stats sizing: calculated: 5624B, actual: 9704B
> [237.232s][debug][gc,plab              ] GC(13) Shenandoah collector GCLAB stats PLAB allocation: allocated: 5003416B, wasted: 2584B, unused: 2358784B, used: 2642048B, undo waste: 0B, 
> [237.232s][debug][gc,plab              ] GC(13) Shenandoah collector GCLAB stats sizing: calculated: 1321024B, actual: 1616200B
> [237.232s][info ][gc,ergo              ] GC(13) Pacer for Update-Refs. Used: 1992M, Free: 18487M, Non-Taxable: 1848M, Alloc Tax Rate: 1.1x
> [237.232s][info ][gc                   ] GC(13) Pause Init Update Refs 0.403ms
> [237.232s][info ][gc,start             ] GC(13) Concurrent update references
> [237.410s][info ][gc                   ] GC(13) Concurrent update references 1992M->2243M(20480M) 177.690ms
> [237.412s][info ][gc,start             ] GC(13) Pause Final Update Refs
> [237.413s][info ][gc,ergo              ] GC(13) Free: 19900M, Regions: 2499 mutator, 0 collector
> [237.413s][info ][gc                   ] GC(13) Pause Final Update Refs 1.539ms
> [237.413s][info ][gc,start             ] GC(13) Concurrent cleanup
> [237.419s][info ][gc                   ] GC(13) Concurrent cleanup 2243M->591M(20480M) 6.196ms
> [237.420s][info ][gc,ergo              ] GC(13) Capacity: 20480M, Peak Occupancy: 17549M, Lowest Free: 2930M, Free Threshold: 2048M
> [237.420s][info ][gc,ergo              ] Free set: Used: 12M of 19900M, Regions: 2498 mutator, 0 collector
> [237.420s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 15130624K, External frag: 26%, Internal frag: 0%
> [237.420s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [237.420s][info ][gc,ergo              ] Pacer for Idle. Initial: 409M, Alloc Tax Rate: 1.0x
> [252.318s][info ][gc,ergo              ] Concurrent marking triggered. Free: 3071M, Free Threshold: 3072M; Allocated: 17320M, Alloc Threshold: 0M
> [252.318s][info ][gc,ergo              ] Free set: Used: 16828M of 19900M, Regions: 394 mutator, 0 collector
> [252.318s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 3022848K, External frag: 4%, Internal frag: 2%
> [252.318s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [252.320s][info ][gc,start             ] GC(14) Pause Init Mark
> [252.320s][debug][gc,tlab              ] GC(14) TLAB totals: thrds: 124  refills: 18378 max: 1037 slow allocs: 1 max 1 waste:  0.1% gc: 51629888B max: 1048536B slow: 14297192B max: 803248B fast: 0B max: 0B
> [252.322s][info ][gc,ergo              ] GC(14) Pacer for Mark. Used: 17407M, Free: 3071M, Non-Taxable: 307M, Alloc Tax Rate: 20.8x
> [252.322s][info ][gc                   ] GC(14) Pause Init Mark 2.432ms
> [252.322s][info ][gc,start             ] GC(14) Concurrent marking
> [252.422s][info ][gc                   ] GC(14) Concurrent marking 17407M->17557M(20480M) 99.549ms
> [252.423s][info ][gc,start             ] GC(14) Pause Final Mark
> [252.425s][info ][gc,ergo              ] GC(14) Immediate Garbage: 16511M (95% of total), 2064 regions
> [252.425s][info ][gc,ergo              ] GC(14) Free: 19434M, Regions: 2439 mutator, 0 collector
> [252.425s][info ][gc,ergo              ] GC(14) Pacer for Evacuation. CSet: 0M, Free: 19434M, Non-Taxable: 1943M, Alloc Tax Rate: 1.1x
> [252.425s][info ][gc                   ] GC(14) Pause Final Mark 1.522ms
> [252.425s][info ][gc,start             ] GC(14) Concurrent cleanup
> [252.426s][info ][gc                   ] GC(14) Concurrent cleanup 17557M->1068M(20480M) 1.028ms
> [252.426s][info ][gc,ergo              ] GC(14) Capacity: 20480M, Peak Occupancy: 17557M, Lowest Free: 2922M, Free Threshold: 2048M
> [252.426s][info ][gc,ergo              ] Free set: Used: 23M of 19434M, Regions: 2437 mutator, 0 collector
> [252.426s][info ][gc,ergo              ] Free set: Mutator view: Max regular: 8192K, Max humongous: 2867200K, External frag: 86%, Internal frag: 0%
> [252.426s][info ][gc,ergo              ] Free set: Collector view: Max regular: 0K
> [252.426s][info ][gc,ergo              ] Pacer for Idle. Initial: 409M, Alloc Tax Rate: 1.0x
> ==================
> 
> 4) GC statistics for “traversal” without class unloading (different run than the log above, but the observed behaviour was exactly the same)
> ==================
> [503.616s][debug][gc,metaspace,freelist] ChunkManager::chunk_freelist_allocate: 0x00007f055419fdf0 chunk 0x00007f02784fb400  size 128 count 0 Free chunk total 3072  count 6
> [503.645s][info ][gc                   ] Cancelling GC: Stopping VM
> [503.646s][info ][gc,heap,exit         ] Heap
> [503.646s][info ][gc,heap,exit         ] Shenandoah Heap
> [503.646s][info ][gc,heap,exit         ]  20971520K total, 19513344K committed, 13436610K used
> [503.646s][info ][gc,heap,exit         ]  2560 x 8192K regions
> [503.646s][info ][gc,heap,exit         ] Status: cancelled
> [503.646s][info ][gc,heap,exit         ] Reserved region:
> [503.646s][info ][gc,heap,exit         ]  - [0x00000002c0000000, 0x00000007c0000000) 
> [503.646s][info ][gc,stats             ] 
> [503.646s][info ][gc,stats             ] GC STATISTICS:
> [503.646s][info ][gc,stats             ]   "(G)" (gross) pauses include VM time: time to notify and block threads, do the pre-
> [503.646s][info ][gc,stats             ]         and post-safepoint housekeeping. Use -XX:+PrintSafepointStatistics to dissect.
> [503.646s][info ][gc,stats             ]   "(N)" (net) pauses are the times spent in the actual GC code.
> [503.646s][info ][gc,stats             ]   "a" is average time for each phase, look at levels to see if average makes sense.
> [503.646s][info ][gc,stats             ]   "lvls" are quantiles: 0% (minimum), 25%, 50% (median), 75%, 100% (maximum).
> [503.646s][info ][gc,stats             ] 
> [503.646s][info ][gc,stats             ] Total Pauses (G)            =     0.87 s (a =    17368 us) (n =    50) (lvls, us =     7578,     8105,     8887,     9609,   430654)
> [503.646s][info ][gc,stats             ] Total Pauses (N)            =     0.75 s (a =    15007 us) (n =    50) (lvls, us =     5410,     5898,     6602,     7168,   421706)
> [503.646s][info ][gc,stats             ]   Weak References           =     0.02 s (a =      914 us) (n =    25) (lvls, us =      459,      479,      502,      547,     8559)
> [503.646s][info ][gc,stats             ]     Process                 =     0.02 s (a =      608 us) (n =    25) (lvls, us =      361,      371,      396,      459,     3317)
> [503.646s][info ][gc,stats             ]     Enqueue                 =     0.01 s (a =      299 us) (n =    25) (lvls, us =       72,       81,       93,       99,     5231)
> [503.646s][info ][gc,stats             ] Pause Degenerated GC (G)    =     0.43 s (a =   430643 us) (n =     1) (lvls, us =   429688,   429688,   429688,   429688,   430643)
> [503.646s][info ][gc,stats             ] Pause Degenerated GC (N)    =     0.42 s (a =   421705 us) (n =     1) (lvls, us =   419922,   419922,   419922,   419922,   421705)
> [503.646s][info ][gc,stats             ] Pause Init Traversal (G)    =     0.21 s (a =     8358 us) (n =    25) (lvls, us =     7578,     7793,     8105,     8164,    16517)
> [503.646s][info ][gc,stats             ] Pause Init Traversal (N)    =     0.15 s (a =     6183 us) (n =    25) (lvls, us =     5410,     5645,     5898,     5957,    14575)
> [503.646s][info ][gc,stats             ]   Prepare                   =     0.01 s (a =      515 us) (n =    25) (lvls, us =      447,      465,      480,      498,      717)
> [503.646s][info ][gc,stats             ]   Work                      =     0.14 s (a =     5503 us) (n =    25) (lvls, us =     4824,     4941,     5273,     5332,    13030)
> [503.646s][info ][gc,stats             ]     TI: Thread Roots        =     0.01 s (a =      509 us) (n =    25) (lvls, us =      414,      471,      484,      498,     1088)
> [503.646s][info ][gc,stats             ]     TI: Code Cache Roots    =     0.06 s (a =     2298 us) (n =    25) (lvls, us =     1934,     1973,     2168,     2207,     6487)
> [503.646s][info ][gc,stats             ]     TI: String Table Roots  =     0.02 s (a =      666 us) (n =    25) (lvls, us =      551,      615,      629,      639,     1605)
> [503.646s][info ][gc,stats             ]     TI: Universe Roots      =     0.00 s (a =        2 us) (n =    25) (lvls, us =        1,        1,        2,        2,        9)
> [503.646s][info ][gc,stats             ]     TI: JNI Roots           =     0.00 s (a =        3 us) (n =    25) (lvls, us =        2,        2,        2,        3,        9)
> [503.646s][info ][gc,stats             ]     TI: Synchronizer Roots  =     0.00 s (a =        0 us) (n =    25) (lvls, us =        0,        0,        0,        0,        0)
> [503.646s][info ][gc,stats             ]     TI: Management Roots    =     0.00 s (a =        2 us) (n =    25) (lvls, us =        2,        2,        2,        2,        4)
> [503.646s][info ][gc,stats             ]     TI: System Dict Roots   =     0.01 s (a =      306 us) (n =    25) (lvls, us =      162,      170,      191,      314,     1661)
> [503.646s][info ][gc,stats             ]     TI: CLDG Roots          =     0.02 s (a =      727 us) (n =    25) (lvls, us =      551,      662,      676,      701,     1772)
> [503.646s][info ][gc,stats             ]     TI: JVMTI Roots         =     0.00 s (a =        1 us) (n =    25) (lvls, us =        1,        1,        1,        1,        1)
> [503.646s][info ][gc,stats             ] Pause Final Traversal (G)   =     0.23 s (a =     9530 us) (n =    24) (lvls, us =     8887,     9062,     9336,     9805,    10682)
> [503.646s][info ][gc,stats             ] Pause Final Traversal (N)   =     0.17 s (a =     7252 us) (n =    24) (lvls, us =     6582,     6836,     7148,     7461,     8415)
> [503.646s][info ][gc,stats             ]   Work                      =     0.34 s (a =    13730 us) (n =    25) (lvls, us =     5547,     5801,     6016,     6309,   196072)
> [503.646s][info ][gc,stats             ]     TF: Thread Roots        =     0.02 s (a =      626 us) (n =    25) (lvls, us =      447,      473,      496,      521,     3577)
> [503.646s][info ][gc,stats             ]     TF: Code Cache Roots    =     0.05 s (a =     2129 us) (n =    25) (lvls, us =     1914,     1992,     2168,     2207,     2280)
> [503.646s][info ][gc,stats             ]     TF: String Table Roots  =     0.01 s (a =      379 us) (n =    25) (lvls, us =      361,      367,      371,      387,      397)
> [503.646s][info ][gc,stats             ]     TF: Universe Roots      =     0.00 s (a =        1 us) (n =    25) (lvls, us =        1,        1,        1,        1,        2)
> [503.646s][info ][gc,stats             ]     TF: JNI Roots           =     0.00 s (a =        2 us) (n =    25) (lvls, us =        2,        2,        2,        3,        3)
> [503.646s][info ][gc,stats             ]     TF: Synchronizer Roots  =     0.00 s (a =        0 us) (n =    25) (lvls, us =        0,        0,        0,        0,        0)
> [503.646s][info ][gc,stats             ]     TF: Management Roots    =     0.00 s (a =        2 us) (n =    25) (lvls, us =        1,        2,        2,        3,        3)
> [503.646s][info ][gc,stats             ]     TF: System Dict Roots   =     0.01 s (a =      217 us) (n =    25) (lvls, us =      135,      148,      244,      244,      274)
> [503.646s][info ][gc,stats             ]     TF: CLDG Roots          =     0.02 s (a =      604 us) (n =    25) (lvls, us =      486,      551,      596,      627,      838)
> [503.646s][info ][gc,stats             ]     TF: JVMTI Roots         =     0.00 s (a =        1 us) (n =    25) (lvls, us =        1,        1,        1,        1,        1)
> [503.646s][info ][gc,stats             ]     TF: Finish Queues       =     0.18 s (a =     7384 us) (n =    25) (lvls, us =      312,      563,      621,      686,   169584)
> [503.646s][info ][gc,stats             ]   Cleanup                   =     0.00 s (a =      103 us) (n =    25) (lvls, us =       95,       96,       99,      105,      121)
> [503.646s][info ][gc,stats             ] Concurrent Cleanup          =     0.07 s (a =     2967 us) (n =    24) (lvls, us =     1094,     1621,     1836,     2109,    26448)
> [503.646s][info ][gc,stats             ]   Recycle                   =     0.04 s (a =     1588 us) (n =    25) (lvls, us =      262,      494,      688,      727,    24130)
> [503.646s][info ][gc,stats             ]   Reset Bitmaps             =     0.25 s (a =     9837 us) (n =    25) (lvls, us =      623,      895,     1035,     1484,   215718)
> [503.646s][info ][gc,stats             ] Concurrent Traversal        =     5.10 s (a =   204183 us) (n =    25) (lvls, us =    84375,    87109,    88477,    92383,  2872023)
> [503.646s][info ][gc,stats             ] 
> [503.646s][info ][gc,stats             ] 
> [503.646s][info ][gc,stats             ] Under allocation pressure, concurrent cycles may cancel, and either continue cycle
> [503.646s][info ][gc,stats             ] under stop-the-world pause or result in stop-the-world Full GC. Increase heap size,
> [503.646s][info ][gc,stats             ] tune GC heuristics, set more aggressive pacing delay, or lower allocation rate
> [503.647s][info ][gc,stats             ] to avoid Degenerated and Full GC cycles.
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ]    24 successful concurrent GCs
> [503.647s][info ][gc,stats             ]       0 invoked explicitly
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ]     1 Degenerated GCs
> [503.647s][info ][gc,stats             ]       1 caused by allocation failure
> [503.647s][info ][gc,stats             ]         1 happened at Traversal
> [503.647s][info ][gc,stats             ]       0 upgraded to Full GC
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ]     0 Full GCs
> [503.647s][info ][gc,stats             ]       0 invoked explicitly
> [503.647s][info ][gc,stats             ]       0 caused by allocation failure
> [503.647s][info ][gc,stats             ]       0 upgraded from Degenerated GC
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] ALLOCATION PACING:
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] Max pacing delay is set for 10 ms.
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] Higher delay would prevent application outpacing the GC, but it will hide the GC latencies
> [503.647s][info ][gc,stats             ] from the STW pause times. Pacing affects the individual threads, and so it would also be
> [503.647s][info ][gc,stats             ] invisible to the usual profiling tools, but would add up to end-to-end application latency.
> [503.647s][info ][gc,stats             ] Raise max pacing delay with care.
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] Actual pacing delays histogram:
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ]       From -         To        Count
> [503.647s][info ][gc,stats             ]       1 ms -       2 ms:         171
> [503.647s][info ][gc,stats             ]       2 ms -       4 ms:         166
> [503.647s][info ][gc,stats             ]       4 ms -       8 ms:          37
> [503.647s][info ][gc,stats             ]       8 ms -      16 ms:        1820
> [503.647s][info ][gc,stats             ]      16 ms -      32 ms:          24
> [503.647s][info ][gc,stats             ]      32 ms -      64 ms:           0
> [503.647s][info ][gc,stats             ]      64 ms -     128 ms:           0
> [503.647s][info ][gc,stats             ]     128 ms -     256 ms:           0
> [503.647s][info ][gc,stats             ]     256 ms -     512 ms:           1
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ] 
> [503.647s][info ][gc,stats             ]   Allocation tracing is disabled, use -XX:+ShenandoahAllocationTrace to enable.
> ==================
> 
> If you are interested in experiments with some concrete settings or more statistics, don’t hesitate to ask. Running experiments is extremely easy now, when the application is JDK11 ready.
> P.
> 
>> On 9 May 2018, at 22:27, Roman Kennke <rkennke at redhat.com> wrote:
>>
>> Hi Petr,
>>
>> thanks for running the experiments with newer builds again.
>>
>> I guess the class unloading problem can currently only be 'solved' by
>> disabling it with -XX:ShenandoahUnloadClassesFrequency=0, which should
>> be ok if your application doesn't use much classloaders. However, be
>> aware that app servers then to make fairly heavy use of classloaders.
>> And then there is also anonymous classes (and related stuff like
>> lambdas) which are not obvious but also put pressure on the class unloading.
>>
>> If it is possible at all for you to run with JDK10 or JDK11, you might
>> want to try the latest build of that:
>>
>> https://builds.shipilev.net/openjdk-shenandoah-jdk/
>>
>> and use the traversal heuristics: -XX:ShenandoahGCHeuristics=traversal
>>
>> that might solve the tightly spaced pauses problem. Explanation:
>> Traversal mode only has one concurrent phase to do all of
>> marking+evacuation+updating-referenes, book-ended by a pause at each
>> end. Classic Shenandoah has 3 concurrent phases, with 4 pauses in total,
>> and in your workload it seems that two of the 3 phases are really short,
>> and that's why you see the 'STW storm' (nice term!) We shall look into
>> improving heuristics to avoid that, e.g. by folding the concurrent
>> phases into a single pause if expected concurrent work is too little.
>>
>> Cheers, Roman
>>
>>
>>> I have tried again with build 25.71-b228
>>>
>>> This build seems like an improvement over the one I used before.
>>> Just by glancing into the logs it seems that GC pauses got smaller.
>>>
>>> About the 2 observations I had in previous email:
>>> 1) Class unloading 
>>> Every fifth ShenandoahFinalMarkStartEvac pauses is longer (around 14ms) compared to the ones when it does not try to unload classes (7-8ms).
>>> As promised, it really says that it tries to unload the classes now, so the longer pause can be associated with it:
>>>
>>> Concurrent marking triggered. Free: 2249M, Free Threshold: 2252M; Allocated: 18009M, Alloc Threshold: 0M
>>> Free set: Used: 17810M of 20060M, Regions: 298 mutator, 0 collector
>>> Free set: Mutator view: Max regular: 8192K, Max humongous: 2048000K, External frag: 12%, Internal frag: 5%
>>> Free set: Collector view: Max regular: 0K
>>> Application time: 8.4427237 seconds
>>> [Pause Init Mark (unload classes)Pacer for Mark. Used: 18226M, Free: 2249M, Non-Taxable: 224M, Alloc Tax Rate: 29.7x
>>> , 3.064 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 1507.865: ShenandoahInitMark               [    1074          0              0    ]      [     0     0     0     1     3    ]  0   
>>> Total time for which application threads were stopped: 0.0054101 seconds, Stopping threads took: 0.0004040 seconds
>>> [Concurrent marking (unload classes) 17G->17G(20G), 33.855 ms]
>>> Application time: 0.0344424 seconds
>>> [Pause Final Mark (unload classes)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 17876M, Target CSet: [0M, 13407M]
>>> Collectable Garbage: 2099M (11% of total), 3M CSet, 263 CSet regions
>>> Immediate Garbage: 15868M (87% of total), 1984 regions
>>> Free: 18046M, Regions: 2271 mutator, 0 collector
>>> Pacer for Evacuation. CSet: 3M, Free: 18046M, Non-Taxable: 1804M, Alloc Tax Rate: 1.1x
>>> , 11.950 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 1507.905: ShenandoahFinalMarkStartEvac       [    1074          1              4    ]      [     0     0     0     1    11    ]  0   
>>> Total time for which application threads were stopped: 0.0146809 seconds, Stopping threads took: 0.0006311 seconds
>>> [Concurrent cleanup 17G->2442M(20G), 0.566 ms]
>>> [Concurrent evacuation 2445M->2464M(20G), 3.280 ms]
>>>
>>> 2) Too closely spaced STW pauses
>>> It is the same story as before. This actually got worse It seems that Shenandoah is doing too well on strongly generational workloads like my application is doing (when application-scoped objects are created at the beginning and almost everything created after that is request scoped). This is very extreme with default settings. The JVM takes 13 concurrent threads on the test machine and then the log is full of pathological situations like this:
>>>
>>> 1431.134: ShenandoahInitMark               [    1074          0              1    ]      [     0     0     0     1     2    ]  0   
>>> Total time for which application threads were stopped: 0.0052016 seconds, Stopping threads took: 0.0004466 seconds
>>> [Concurrent marking (process refs) 17G->17G(20G), 32.542 ms]
>>> [Concurrent precleaning 17G->17G(20G), 2.180 ms]
>>> Application time: 0.0367779 seconds
>>> [Pause Final Mark (process refs)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 18102M, Target CSet: [0M, 13576M]
>>> Collectable Garbage: 1915M (10% of total), 9M CSet, 241 CSet regions
>>> Immediate Garbage: 16078M (88% of total), 2011 regions
>>> Free: 18253M, Regions: 2296 mutator, 0 collector
>>> Pacer for Evacuation. CSet: 9M, Free: 18253M, Non-Taxable: 1825M, Alloc Tax Rate: 1.1x
>>> , 4.711 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 1431.176: ShenandoahFinalMarkStartEvac       [    1074          4              0    ]      [     0     0     0     1     4    ]  0   
>>> Total time for which application threads were stopped: 0.0073044 seconds, Stopping threads took: 0.0006198 seconds
>>> [Concurrent cleanup 17G->2232M(20G), 0.627 ms]
>>> [Concurrent evacuation 2232M->2274M(20G), 3.539 ms]
>>> Application time: 0.0043597 seconds
>>> [Pause Init Update RefsPacer for Update-Refs. Used: 2274M, Free: 18202M, Non-Taxable: 1820M, Alloc Tax Rate: 1.1x
>>> , 0.272 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 1431.188: ShenandoahInitUpdateRefs         [    1074          0              1    ]      [     0     0     0     2     0    ]  0   
>>> Total time for which application threads were stopped: 0.0029428 seconds, Stopping threads took: 0.0003774 seconds
>>> [Concurrent update references 2274M->2343M(20G), 29.363 ms]
>>> Application time: 0.0294687 seconds
>>> [Pause Final Update RefsFree: 20060M, Regions: 2524 mutator, 0 collector
>>> , 1.745 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 1431.221: ShenandoahFinalUpdateRefs        [    1074          0              1    ]      [     0     0     0     2     1    ]  0   
>>> Total time for which application threads were stopped: 0.0048456 seconds, Stopping threads took: 0.0005142 seconds
>>> [Concurrent cleanup 2344M->420M(20G), 1.251 ms]
>>>
>>> The effect on the request tail latency observed by the test client during such a STW storm is worse than one larger pause done by G1.
>>> Again setting Concurrent threads to 1 helps a lot, but even with that especially ShenandoahFinalMarkStartEvac is often so short, that it its STW pause almost joins with the next pause:
>>> 958.586: ShenandoahInitMark               [    1075          0              0    ]      [     0     0     0     1     2    ]  0   
>>> Total time for which application threads were stopped: 0.0049878 seconds, Stopping threads took: 0.0004109 seconds
>>> [Concurrent marking 17G->17G(20G), 94.828 ms]
>>> Application time: 0.0949659 seconds
>>> [Pause Final MarkAdaptive CSet Selection. Target Free: 3276M, Actual Free: 18423M, Target CSet: [0M, 13817M]
>>> Collectable Garbage: 1480M (8% of total), 3M CSet, 186 CSet regions
>>> Immediate Garbage: 16511M (90% of total), 2064 regions
>>> Free: 18597M, Regions: 2343 mutator, 0 collector
>>> Pacer for Evacuation. CSet: 3M, Free: 18597M, Non-Taxable: 1859M, Alloc Tax Rate: 1.1x
>>> , 5.164 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 958.686: ShenandoahFinalMarkStartEvac       [    1075          0              0    ]      [     0     0     0     1     5    ]  0   
>>> Total time for which application threads were stopped: 0.0076722 seconds, Stopping threads took: 0.0004436 seconds
>>> [Concurrent cleanup 17G->1896M(20G), 0.684 ms]
>>> [Concurrent evacuation 1896M->1948M(20G), 7.096 ms]
>>> Application time: 0.0081824 seconds
>>> [Pause Init Update RefsPacer for Update-Refs. Used: 1948M, Free: 18526M, Non-Taxable: 1852M, Alloc Tax Rate: 1.1x
>>> , 0.290 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 958.702: ShenandoahInitUpdateRefs         [    1075          0              0    ]      [     0     0     0     1     0    ]  0   
>>> Total time for which application threads were stopped: 0.0027895 seconds, Stopping threads took: 0.0004562 seconds
>>> [Concurrent update references 1948M->2170M(20G), 131.651 ms]
>>> Application time: 0.1317581 seconds
>>> [Pause Final Update RefsFree: 19793M, Regions: 2499 mutator, 0 collector
>>> , 1.593 ms]
>>>         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
>>> 958.836: ShenandoahFinalUpdateRefs        [    1075          0              0    ]      [     0     0     0     1     1    ]  0   
>>> Total time for which application threads were stopped: 0.0039250 seconds, Stopping threads took: 0.0004161 seconds
>>> [Concurrent cleanup 2170M->704M(20G), 4.958 ms]
>>>
>>>
>>>
>>> If you need some more statistics or rerunning the workload with different setting, don’t hesitate to ask.
>>>
>>>
>>> Petr
>>>
>>>
>>>
>>>> On 9 May 2018, at 09:36, Aleksey Shipilev <shade at redhat.com> wrote:
>>>>
>>>> On 05/09/2018 07:53 AM, Roman Kennke wrote:
>>>>> Would it be possible to run the same experiments with a newer build,
>>>>> e.g. one from here:
>>>>>
>>>>> https://builds.shipilev.net/
>>>>
>>>> More precisely, this one for 8u:
>>>> https://builds.shipilev.net/openjdk-shenandoah-jdk8/
>>>>
>>>>
>>>>>> I would suggest mentioning somewhere in the user guide the fact, that the class unloading
>>>>>> feature can significantly prolong GC pauses. Also the GC logging could mention that the class
>>>>>> unloading is going on (yes, you can see it with class unloading logging on, but most people
>>>>>> usually don’t have it on, when trying to fix GC issues).
>>>> It does mention it in recent builds, like this:
>>>>
>>>> [Pause Init Mark (unload classes), 3.463 ms]
>>>> [Concurrent marking (unload classes) 557M->557M(8192M), 53.294 ms]
>>>> [Pause Final Mark (unload classes), 15.750 ms]
>>>> [Concurrent cleanup 557M->543M(8192M), 0.288 ms]
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>>
>>
>>
>