Strange behaviour on half-occupied heap

Tue Aug 28 19:19:52 UTC 2018

> On 27 Aug 2018, at 11:35, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> On 08/16/2018 11:30 AM, Aleksey Shipilev wrote:
>> On 08/10/2018 05:22 PM, Simon Ogorodnik wrote:
>>> Sometime after long run (8-9 hours with almost half-occupied heap) of application with Shenandoah GC, I observe strange behaviour
>>> I can describe it as continuous evacuation of whole GCLAB. 
>> 
>> Thanks for the verbose bug report!
>> 
>> I think this is a heuristics bug. Adaptive CSet selection decides how much to evacuate based on the
>> heuristics guidance how much free space we have to have by the end of the cycle ("Min Garbage"
>> below). The runaway happens when that target free threshold tanks significantly, so heuristics
>> decides to evacuate everything in reach.
>> 
>> I'll study why that happens. Meanwhile, you may want to cap the max at, say, 30% with
>> -XX:ShenandoahMaxFreeThreshold=30.
> 
> Current {dev, 11, 10, 8} builds have better "adaptive" heuristics that should be immune to this
> runway. Please try again?
> 
> Thanks,
> -Aleksey
> 

I faced some problems building JDK 8 for Win64 with latest changes
Problems: 
...\hotspot\src\share\vm\gc_implementation\shenandoah\shenandoahHeapRegion.cpp(566) : error C2220: warning treated as error - no 'object' file generated
...\hotspot\src\share\vm\gc_implementation\shenandoah\shenandoahHeapRegion.cpp(566) : warning C4334: '<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
...\hotspot\src\share\vm\gc_implementation\shenandoah\shenandoahPhaseTimings.cpp(34) : error C2220: warning treated as error - no 'object' file generated
...\hotspot\src\share\vm\gc_implementation\shenandoah\shenandoahPhaseTimings.cpp(34) : warning C4267: 'initializing' : conversion from 'size_t' to 'uint', possible loss of data
Here is my patch: https://drive.google.com/file/d/1zZ_w_KqdAtZf8TP0nCZ2f-2FC4jrVvV2/view?usp=sharing <https://drive.google.com/file/d/1zZ_w_KqdAtZf8TP0nCZ2f-2FC4jrVvV2/view?usp=sharing>

Looks like the original problem fixed, but new very possible related problem opens up with a fix.

After a bit of time heap fills with fragmented data. 
Which results in the increasing rate of collections, pacing delays and finally full GC, 
while actually there are 350 MB of live data it fills whole 2G heap

Also, during concurrent collections ‘Expected live’ value seems invalid (Maybe valid because of fragmentation) 

Visualizer view:
https://drive.google.com/file/d/1zXa4bSMLZjlrkQtkJQ1ogbkt-tT5hfer/view?usp=sharing <https://drive.google.com/file/d/1zXa4bSMLZjlrkQtkJQ1ogbkt-tT5hfer/view?usp=sharing>
Here I forced explicit GC to show that heap actually isn’t that occupied

The beginning of 1.current log contains other Full GC, caused by allocation failure 
Full logs:
https://drive.google.com/file/d/1OwcrCAO-sdHBtS2AGfgPntjXxwZ2Etkz/view?usp=sharing <https://drive.google.com/file/d/1OwcrCAO-sdHBtS2AGfgPntjXxwZ2Etkz/view?usp=sharing>

Recorded on:
https://github.com/JetBrains/jdk8u_hotspot/commit/ae73f3331494b1e35ca8e7d3c5f8f5e520b4521c <https://github.com/JetBrains/jdk8u_hotspot/commit/ae73f3331494b1e35ca8e7d3c5f8f5e520b4521c>
Which correspond to http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/8ecf76dd8c50 <http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/8ecf76dd8c50>

2018-08-28T20:54:37.217-0300: 1517.948: Total time for which application threads were stopped: 0.0038953 seconds, Stopping threads took: 0.0001144 seconds
2018-08-28T20:54:37.217-0300: 1517.948: [Concurrent cleanupUsing 8 of 8 workers for concurrent cleanup
 1751M->1697M(2048M), 2.351 ms]
Free set: Used: 0M of 313M, Regions: 455 mutator, 0 collector
Free set: Mutator view: Max regular: 1024K, Max humongous: 14336K, External frag: 96%, Internal frag: 31%
Free set: Collector view: Max regular: 0K
Pacer for Idle. Initial: 40M, Alloc Tax Rate: 1.0x
2018-08-28T20:54:37.242-0300: 1517.972: Application time: 0.0243478 seconds
………..[Application time for a bit]…………...
2018-08-28T20:54:39.100-0300: 1519.831: Total time for which application threads were stopped: 0.0003293 seconds, Stopping threads took: 0.0000320 seconds
Failed to allocate 25M
Cancelling GC: Allocation Failure
Trigger: Handle Allocation Failure
Free set: Used: 55M of 313M, Regions: 386 mutator, 0 collector
Free set: Mutator view: Max regular: 1024K, Max humongous: 14336K, External frag: 95%, Internal frag: 33%
Free set: Collector view: Max regular: 0K
2018-08-28T20:54:39.822-0300: 1520.553: Application time: 0.7222121 seconds
2018-08-28T20:54:39.822-0300: 1520.553: [Pause Degenerated GC (Outside of Cycle)Using 8 of 8 workers for stw degenerated gc
Pacer for Mark. Expected Live: 1161M, Free: 257M, Non-Taxable: 25M, Alloc Tax Rate: 16.5x
Adaptive CSet Selection. Target Free: 204M, Actual Free: 201M, Max CSet: 50M, Min Garbage: 3M
Collectable Garbage: 139M (23% of total), 7M CSet, 147 CSet regions
Immediate Garbage: 4M (0% of total), 4 regions
Free: 261M, Regions: 396 mutator, 0 collector
Pacer for Evacuation. Used CSet: 146M, Avail: 253M, Non-Taxable: 25M, Alloc Tax Rate: 1.4x
Pacer for Update Refs. Used: 1756M, Free: 253M, Non-Taxable: 25M, Alloc Tax Rate: 8.5x
Free: 400M, Regions: 536 mutator, 0 collector
Used: before: 1752M, after: 1610M
Internal frag: before: 2.2%, after: 2.3%
External frag: before: 94.9%, after: 96.6%
 1752M->1610M(2048M), 265.637 ms]
2018-08-28T20:54:40.088-0300: 1520.819: Total time for which application threads were stopped: 0.2660198 seconds, Stopping threads took: 0.0000413 seconds
Free set: Used: 0M of 400M, Regions: 536 mutator, 0 collector
Free set: Mutator view: Max regular: 1024K, Max humongous: 14336K, External frag: 97%, Internal frag: 25%
Free set: Collector view: Max regular: 0K
Failed to allocate 25M
Cancelling GC: Allocation Failure
Pacer for Idle. Initial: 40M, Alloc Tax Rate: 1.0x
Trigger: Handle Allocation Failure
Free set: Used: 0M of 400M, Regions: 536 mutator, 0 collector
Free set: Mutator view: Max regular: 1024K, Max humongous: 14336K, External frag: 97%, Internal frag: 25%
Free set: Collector view: Max regular: 0K
2018-08-28T20:54:40.090-0300: 1520.820: Application time: 0.0016944 seconds
2018-08-28T20:54:40.090-0300: 1520.821: [Pause Degenerated GC (Outside of Cycle)Using 8 of 8 workers for stw degenerated gc
Pacer for Mark. Expected Live: 1161M, Free: 400M, Non-Taxable: 40M, Alloc Tax Rate: 10.6x
Adaptive CSet Selection. Target Free: 204M, Actual Free: 339M, Max CSet: 84M, Min Garbage: 0M
Collectable Garbage: 0M (0% of total), 0M CSet, 0 CSet regions
Immediate Garbage: 0M (0% of total), 0 regions
Free: 400M, Regions: 536 mutator, 0 collector
Pacer for Evacuation. Used CSet: 0M, Avail: 400M, Non-Taxable: 40M, Alloc Tax Rate: 1.1x
Used: before: 1610M, after: 1610M
Internal frag: before: 2.3%, after: 2.3%
External frag: before: 96.6%, after: 96.6%
Not enough progress (0M, need 1M) after Degenerated GC
Not enough internal fragmentation improvement (0.0%, need 1.0%) after Degenerated GC
Not enough external fragmentation improvement (0.0%, need 1.0%) after Degenerated GC
Cancelling GC: Shenandoah Upgrade To Full GC
Free: 1550M, Regions: 1939 mutator, 0 collector
Used: before: 1610M, after: 493M
Internal frag: before: 2.3%, after: 0.8%
External frag: before: 96.6%, after: 55.4%
 1610M->493M(2048M), 763.914 ms]
2018-08-28T20:54:40.854-0300: 1521.585: Total time for which application threads were stopped: 0.7643449 seconds, Stopping threads took: 0.0000241 seconds
Free set: Used: 0M of 1550M, Regions: 1938 mutator, 0 collector
Free set: Mutator view: Max regular: 1024K, Max humongous: 710656K, External frag: 56%, Internal frag: 19%
Free set: Collector view: Max regular: 0K
Pacer for Idle. Initial: 40M, Alloc Tax Rate: 1.0x