Feedback from experiments on production application

Wed May 9 20:09:35 UTC 2018

Hi,

I have tried again with build 25.71-b228

This build seems like an improvement over the one I used before.
Just by glancing into the logs it seems that GC pauses got smaller.

About the 2 observations I had in previous email:
1) Class unloading 
Every fifth ShenandoahFinalMarkStartEvac pauses is longer (around 14ms) compared to the ones when it does not try to unload classes (7-8ms).
As promised, it really says that it tries to unload the classes now, so the longer pause can be associated with it:

Concurrent marking triggered. Free: 2249M, Free Threshold: 2252M; Allocated: 18009M, Alloc Threshold: 0M
Free set: Used: 17810M of 20060M, Regions: 298 mutator, 0 collector
Free set: Mutator view: Max regular: 8192K, Max humongous: 2048000K, External frag: 12%, Internal frag: 5%
Free set: Collector view: Max regular: 0K
Application time: 8.4427237 seconds
[Pause Init Mark (unload classes)Pacer for Mark. Used: 18226M, Free: 2249M, Non-Taxable: 224M, Alloc Tax Rate: 29.7x
, 3.064 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1507.865: ShenandoahInitMark               [    1074          0              0    ]      [     0     0     0     1     3    ]  0   
Total time for which application threads were stopped: 0.0054101 seconds, Stopping threads took: 0.0004040 seconds
[Concurrent marking (unload classes) 17G->17G(20G), 33.855 ms]
Application time: 0.0344424 seconds
[Pause Final Mark (unload classes)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 17876M, Target CSet: [0M, 13407M]
Collectable Garbage: 2099M (11% of total), 3M CSet, 263 CSet regions
Immediate Garbage: 15868M (87% of total), 1984 regions
Free: 18046M, Regions: 2271 mutator, 0 collector
Pacer for Evacuation. CSet: 3M, Free: 18046M, Non-Taxable: 1804M, Alloc Tax Rate: 1.1x
, 11.950 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1507.905: ShenandoahFinalMarkStartEvac       [    1074          1              4    ]      [     0     0     0     1    11    ]  0   
Total time for which application threads were stopped: 0.0146809 seconds, Stopping threads took: 0.0006311 seconds
[Concurrent cleanup 17G->2442M(20G), 0.566 ms]
[Concurrent evacuation 2445M->2464M(20G), 3.280 ms]

2) Too closely spaced STW pauses
It is the same story as before. This actually got worse It seems that Shenandoah is doing too well on strongly generational workloads like my application is doing (when application-scoped objects are created at the beginning and almost everything created after that is request scoped). This is very extreme with default settings. The JVM takes 13 concurrent threads on the test machine and then the log is full of pathological situations like this:

1431.134: ShenandoahInitMark               [    1074          0              1    ]      [     0     0     0     1     2    ]  0   
Total time for which application threads were stopped: 0.0052016 seconds, Stopping threads took: 0.0004466 seconds
[Concurrent marking (process refs) 17G->17G(20G), 32.542 ms]
[Concurrent precleaning 17G->17G(20G), 2.180 ms]
Application time: 0.0367779 seconds
[Pause Final Mark (process refs)Adaptive CSet Selection. Target Free: 3276M, Actual Free: 18102M, Target CSet: [0M, 13576M]
Collectable Garbage: 1915M (10% of total), 9M CSet, 241 CSet regions
Immediate Garbage: 16078M (88% of total), 2011 regions
Free: 18253M, Regions: 2296 mutator, 0 collector
Pacer for Evacuation. CSet: 9M, Free: 18253M, Non-Taxable: 1825M, Alloc Tax Rate: 1.1x
, 4.711 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1431.176: ShenandoahFinalMarkStartEvac       [    1074          4              0    ]      [     0     0     0     1     4    ]  0   
Total time for which application threads were stopped: 0.0073044 seconds, Stopping threads took: 0.0006198 seconds
[Concurrent cleanup 17G->2232M(20G), 0.627 ms]
[Concurrent evacuation 2232M->2274M(20G), 3.539 ms]
Application time: 0.0043597 seconds
[Pause Init Update RefsPacer for Update-Refs. Used: 2274M, Free: 18202M, Non-Taxable: 1820M, Alloc Tax Rate: 1.1x
, 0.272 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1431.188: ShenandoahInitUpdateRefs         [    1074          0              1    ]      [     0     0     0     2     0    ]  0   
Total time for which application threads were stopped: 0.0029428 seconds, Stopping threads took: 0.0003774 seconds
[Concurrent update references 2274M->2343M(20G), 29.363 ms]
Application time: 0.0294687 seconds
[Pause Final Update RefsFree: 20060M, Regions: 2524 mutator, 0 collector
, 1.745 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1431.221: ShenandoahFinalUpdateRefs        [    1074          0              1    ]      [     0     0     0     2     1    ]  0   
Total time for which application threads were stopped: 0.0048456 seconds, Stopping threads took: 0.0005142 seconds
[Concurrent cleanup 2344M->420M(20G), 1.251 ms]

The effect on the request tail latency observed by the test client during such a STW storm is worse than one larger pause done by G1.
Again setting Concurrent threads to 1 helps a lot, but even with that especially ShenandoahFinalMarkStartEvac is often so short, that it its STW pause almost joins with the next pause:
958.586: ShenandoahInitMark               [    1075          0              0    ]      [     0     0     0     1     2    ]  0   
Total time for which application threads were stopped: 0.0049878 seconds, Stopping threads took: 0.0004109 seconds
[Concurrent marking 17G->17G(20G), 94.828 ms]
Application time: 0.0949659 seconds
[Pause Final MarkAdaptive CSet Selection. Target Free: 3276M, Actual Free: 18423M, Target CSet: [0M, 13817M]
Collectable Garbage: 1480M (8% of total), 3M CSet, 186 CSet regions
Immediate Garbage: 16511M (90% of total), 2064 regions
Free: 18597M, Regions: 2343 mutator, 0 collector
Pacer for Evacuation. CSet: 3M, Free: 18597M, Non-Taxable: 1859M, Alloc Tax Rate: 1.1x
, 5.164 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
958.686: ShenandoahFinalMarkStartEvac       [    1075          0              0    ]      [     0     0     0     1     5    ]  0   
Total time for which application threads were stopped: 0.0076722 seconds, Stopping threads took: 0.0004436 seconds
[Concurrent cleanup 17G->1896M(20G), 0.684 ms]
[Concurrent evacuation 1896M->1948M(20G), 7.096 ms]
Application time: 0.0081824 seconds
[Pause Init Update RefsPacer for Update-Refs. Used: 1948M, Free: 18526M, Non-Taxable: 1852M, Alloc Tax Rate: 1.1x
, 0.290 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
958.702: ShenandoahInitUpdateRefs         [    1075          0              0    ]      [     0     0     0     1     0    ]  0   
Total time for which application threads were stopped: 0.0027895 seconds, Stopping threads took: 0.0004562 seconds
[Concurrent update references 1948M->2170M(20G), 131.651 ms]
Application time: 0.1317581 seconds
[Pause Final Update RefsFree: 19793M, Regions: 2499 mutator, 0 collector
, 1.593 ms]
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
958.836: ShenandoahFinalUpdateRefs        [    1075          0              0    ]      [     0     0     0     1     1    ]  0   
Total time for which application threads were stopped: 0.0039250 seconds, Stopping threads took: 0.0004161 seconds
[Concurrent cleanup 2170M->704M(20G), 4.958 ms]

If you need some more statistics or rerunning the workload with different setting, don’t hesitate to ask.

Petr

> On 9 May 2018, at 09:36, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> On 05/09/2018 07:53 AM, Roman Kennke wrote:
>> Would it be possible to run the same experiments with a newer build,
>> e.g. one from here:
>> 
>> https://builds.shipilev.net/
> 
> More precisely, this one for 8u:
>  https://builds.shipilev.net/openjdk-shenandoah-jdk8/
> 
> 
>>> I would suggest mentioning somewhere in the user guide the fact, that the class unloading
>>> feature can significantly prolong GC pauses. Also the GC logging could mention that the class
>>> unloading is going on (yes, you can see it with class unloading logging on, but most people
>>> usually don’t have it on, when trying to fix GC issues).
> It does mention it in recent builds, like this:
> 
> [Pause Init Mark (unload classes), 3.463 ms]
> [Concurrent marking (unload classes) 557M->557M(8192M), 53.294 ms]
> [Pause Final Mark (unload classes), 15.750 ms]
> [Concurrent cleanup 557M->543M(8192M), 0.288 ms]
> 
> Thanks,
> -Aleksey
>