Feedback from experiments on production application

Wed May 9 20:27:22 UTC 2018

Hi Petr,

thanks for running the experiments with newer builds again.

I guess the class unloading problem can currently only be 'solved' by
disabling it with -XX:ShenandoahUnloadClassesFrequency=0, which should
be ok if your application doesn't use much classloaders. However, be
aware that app servers then to make fairly heavy use of classloaders.
And then there is also anonymous classes (and related stuff like
lambdas) which are not obvious but also put pressure on the class unloading.

If it is possible at all for you to run with JDK10 or JDK11, you might
want to try the latest build of that:

https://builds.shipilev.net/openjdk-shenandoah-jdk/

and use the traversal heuristics: -XX:ShenandoahGCHeuristics=traversal

that might solve the tightly spaced pauses problem. Explanation:
Traversal mode only has one concurrent phase to do all of
marking+evacuation+updating-referenes, book-ended by a pause at each
end. Classic Shenandoah has 3 concurrent phases, with 4 pauses in total,
and in your workload it seems that two of the 3 phases are really short,
and that's why you see the 'STW storm' (nice term!) We shall look into
improving heuristics to avoid that, e.g. by folding the concurrent
phases into a single pause if expected concurrent work is too little.

Cheers, Roman

> I have tried again with build 25.71-b228
> 
> This build seems like an improvement over the one I used before.
> Just by glancing into the logs it seems that GC pauses got smaller.
> 
> About the 2 observations I had in previous email:
> 1) Class unloading 
> Every fifth ShenandoahFinalMarkStartEvac pauses is longer (around 14ms) compared to the ones when it does not try to unload classes (7-8ms).
> As promised, it really says that it tries to unload the classes now, so the longer pause can be associated with it:
> 
> Concurrent marking triggered. Free: 2249M, Free Threshold: 2252M; Allocated: 18009M, Alloc Threshold: 0M
> Free set: Used: 17810M of 20060M, Regions: 298 mutator, 0 collector
> Free set: Mutator view: Max regular: 8192K, Max humongous: 2048000K, External frag: 12%, Internal frag: 5%
> Free set: Collector view: Max regular: 0K
> Application time: 8.4427237 seconds
> [Pause Init Mark (unload classes)Pacer for Mark. Used: 18226M, Free: 2249M, Non-Taxable: 224M, Alloc Tax Rate: 29.7x
> , 3.064 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 1507.865: ShenandoahInitMark               [    1074          0              0    ]      [     0     0     0     1     3    ]  0   
> Total time for which application threads were stopped: 0.0054101 seconds, Stopping threads took: 0.0004040 seconds
> [Concurrent marking (unload classes) 17G->17G(20G), 33.855 ms]
> Application time: 0.0344424 seconds
> [Pause Final Mark (unload classes)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 17876M, Target CSet: [0M, 13407M]
> Collectable Garbage: 2099M (11% of total), 3M CSet, 263 CSet regions
> Immediate Garbage: 15868M (87% of total), 1984 regions
> Free: 18046M, Regions: 2271 mutator, 0 collector
> Pacer for Evacuation. CSet: 3M, Free: 18046M, Non-Taxable: 1804M, Alloc Tax Rate: 1.1x
> , 11.950 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 1507.905: ShenandoahFinalMarkStartEvac       [    1074          1              4    ]      [     0     0     0     1    11    ]  0   
> Total time for which application threads were stopped: 0.0146809 seconds, Stopping threads took: 0.0006311 seconds
> [Concurrent cleanup 17G->2442M(20G), 0.566 ms]
> [Concurrent evacuation 2445M->2464M(20G), 3.280 ms]
> 
> 2) Too closely spaced STW pauses
> It is the same story as before. This actually got worse It seems that Shenandoah is doing too well on strongly generational workloads like my application is doing (when application-scoped objects are created at the beginning and almost everything created after that is request scoped). This is very extreme with default settings. The JVM takes 13 concurrent threads on the test machine and then the log is full of pathological situations like this:
> 
> 1431.134: ShenandoahInitMark               [    1074          0              1    ]      [     0     0     0     1     2    ]  0   
> Total time for which application threads were stopped: 0.0052016 seconds, Stopping threads took: 0.0004466 seconds
> [Concurrent marking (process refs) 17G->17G(20G), 32.542 ms]
> [Concurrent precleaning 17G->17G(20G), 2.180 ms]
> Application time: 0.0367779 seconds
> [Pause Final Mark (process refs)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 18102M, Target CSet: [0M, 13576M]
> Collectable Garbage: 1915M (10% of total), 9M CSet, 241 CSet regions
> Immediate Garbage: 16078M (88% of total), 2011 regions
> Free: 18253M, Regions: 2296 mutator, 0 collector
> Pacer for Evacuation. CSet: 9M, Free: 18253M, Non-Taxable: 1825M, Alloc Tax Rate: 1.1x
> , 4.711 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 1431.176: ShenandoahFinalMarkStartEvac       [    1074          4              0    ]      [     0     0     0     1     4    ]  0   
> Total time for which application threads were stopped: 0.0073044 seconds, Stopping threads took: 0.0006198 seconds
> [Concurrent cleanup 17G->2232M(20G), 0.627 ms]
> [Concurrent evacuation 2232M->2274M(20G), 3.539 ms]
> Application time: 0.0043597 seconds
> [Pause Init Update RefsPacer for Update-Refs. Used: 2274M, Free: 18202M, Non-Taxable: 1820M, Alloc Tax Rate: 1.1x
> , 0.272 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 1431.188: ShenandoahInitUpdateRefs         [    1074          0              1    ]      [     0     0     0     2     0    ]  0   
> Total time for which application threads were stopped: 0.0029428 seconds, Stopping threads took: 0.0003774 seconds
> [Concurrent update references 2274M->2343M(20G), 29.363 ms]
> Application time: 0.0294687 seconds
> [Pause Final Update RefsFree: 20060M, Regions: 2524 mutator, 0 collector
> , 1.745 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 1431.221: ShenandoahFinalUpdateRefs        [    1074          0              1    ]      [     0     0     0     2     1    ]  0   
> Total time for which application threads were stopped: 0.0048456 seconds, Stopping threads took: 0.0005142 seconds
> [Concurrent cleanup 2344M->420M(20G), 1.251 ms]
> 
> The effect on the request tail latency observed by the test client during such a STW storm is worse than one larger pause done by G1.
> Again setting Concurrent threads to 1 helps a lot, but even with that especially ShenandoahFinalMarkStartEvac is often so short, that it its STW pause almost joins with the next pause:
> 958.586: ShenandoahInitMark               [    1075          0              0    ]      [     0     0     0     1     2    ]  0   
> Total time for which application threads were stopped: 0.0049878 seconds, Stopping threads took: 0.0004109 seconds
> [Concurrent marking 17G->17G(20G), 94.828 ms]
> Application time: 0.0949659 seconds
> [Pause Final MarkAdaptive CSet Selection. Target Free: 3276M, Actual Free: 18423M, Target CSet: [0M, 13817M]
> Collectable Garbage: 1480M (8% of total), 3M CSet, 186 CSet regions
> Immediate Garbage: 16511M (90% of total), 2064 regions
> Free: 18597M, Regions: 2343 mutator, 0 collector
> Pacer for Evacuation. CSet: 3M, Free: 18597M, Non-Taxable: 1859M, Alloc Tax Rate: 1.1x
> , 5.164 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 958.686: ShenandoahFinalMarkStartEvac       [    1075          0              0    ]      [     0     0     0     1     5    ]  0   
> Total time for which application threads were stopped: 0.0076722 seconds, Stopping threads took: 0.0004436 seconds
> [Concurrent cleanup 17G->1896M(20G), 0.684 ms]
> [Concurrent evacuation 1896M->1948M(20G), 7.096 ms]
> Application time: 0.0081824 seconds
> [Pause Init Update RefsPacer for Update-Refs. Used: 1948M, Free: 18526M, Non-Taxable: 1852M, Alloc Tax Rate: 1.1x
> , 0.290 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 958.702: ShenandoahInitUpdateRefs         [    1075          0              0    ]      [     0     0     0     1     0    ]  0   
> Total time for which application threads were stopped: 0.0027895 seconds, Stopping threads took: 0.0004562 seconds
> [Concurrent update references 1948M->2170M(20G), 131.651 ms]
> Application time: 0.1317581 seconds
> [Pause Final Update RefsFree: 19793M, Regions: 2499 mutator, 0 collector
> , 1.593 ms]
>          vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 958.836: ShenandoahFinalUpdateRefs        [    1075          0              0    ]      [     0     0     0     1     1    ]  0   
> Total time for which application threads were stopped: 0.0039250 seconds, Stopping threads took: 0.0004161 seconds
> [Concurrent cleanup 2170M->704M(20G), 4.958 ms]
> 
> 
> 
> If you need some more statistics or rerunning the workload with different setting, don’t hesitate to ask.
> 
>  
> Petr
> 
> 
> 
>> On 9 May 2018, at 09:36, Aleksey Shipilev <shade at redhat.com> wrote:
>>
>> On 05/09/2018 07:53 AM, Roman Kennke wrote:
>>> Would it be possible to run the same experiments with a newer build,
>>> e.g. one from here:
>>>
>>> https://builds.shipilev.net/
>>
>> More precisely, this one for 8u:
>>  https://builds.shipilev.net/openjdk-shenandoah-jdk8/
>>
>>
>>>> I would suggest mentioning somewhere in the user guide the fact, that the class unloading
>>>> feature can significantly prolong GC pauses. Also the GC logging could mention that the class
>>>> unloading is going on (yes, you can see it with class unloading logging on, but most people
>>>> usually don’t have it on, when trying to fix GC issues).
>> It does mention it in recent builds, like this:
>>
>> [Pause Init Mark (unload classes), 3.463 ms]
>> [Concurrent marking (unload classes) 557M->557M(8192M), 53.294 ms]
>> [Pause Final Mark (unload classes), 15.750 ms]
>> [Concurrent cleanup 557M->543M(8192M), 0.288 ms]
>>
>> Thanks,
>> -Aleksey
>>
>