Degenerated GC

Thu Jan 18 15:18:15 UTC 2018

http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/

This patch implements Degenerate GC: better solution to handle allocation failures. We had pushed
bits and pieces of infrastructure needed for it over few past weeks.

Our current scheme roughly approximates the same thing: if allocation failure is raised during the
concurrent mark or concurrent update-refs, we immediately STW and complete the phase under the
pause. There are major caveats in that scheme though: it only works reliably for the phases that
have final-STWs, it complicates the control code significantly, and it tries to continue the cycle
concurrent cycle afterwards, even though we know something is fishy.

Degenerate GC is basically the STW continuation of the concurrent cycle. When concurrent cycle
degenerates, we invoke a single VM operation ("dive into STW"), and complete the same cycle there.
In most cases, we degenerate at the end of concurrent cycle when the majority of work is already done.

If Degenerate GC experiences the second allocation failure during that STW cycle (e.g. during evac),
it upgrades to Full GC. It stands to reason that Degenerate GC is cheaper than Full GC, but here is
how they compare most of the time:

# Degenerated at evacuation, upgraded to Full GC:
[46.755s][info][gc] GC(109) Cancelling concurrent GC: Allocation Failure
[46.755s][info][gc] GC(109) Cannot finish degeneration, upgrading to Full GC
[46.994s][info][gc] GC(109) Pause Degenerated GC (Evacuation) 4054M->527M(4096M) 239.331ms

# Degenerated at update-refs
[52.145s][info][gc] Cancelling concurrent GC: Allocation Failure
[52.147s][info][gc] GC(123) Concurrent update references 3360M->3946M(4096M) 218.713ms
[52.177s][info][gc] GC(124) Pause Degenerated GC (Update Refs) 3946M->1725M(4096M) 20.201ms

So, degeneration can be seen as the softer graceful degradation step before full-stop full-heap
full-moving Full GC.

Degenerate GC brings several major improvements over our usual degenerate scheme:

 a) When allocation failure is raised, we stop *all* threads, not just that allocator thread. This
makes sense because it is very likely that other threads would experience the allocation failure
shortly. This is our failure mode, and GC log would register the GC pause that would correlate with
the actual stalls experienced by application threads.

 b) When degenerate STW is running, it uses ParallelGCThreads count, completing the cycle as fast as
it possibly can. Otherwise, if we degenerated the concurrent cycle, most mutator threads would
probably be stuck waiting for allocation to succeed, but the concurrent cycle would still run with
ConcGCThreads (which is realistically lower than ParallelGCThread), wasting precious wall time.

 c) It handles out-of-cycle allocation failure. When ShConcurrentThread cannot catch up with issuing
the GC cycles fast enough, or when the heuristics misses the allocation spike, our current code just
Full GCs. Current change runs the Degenerate GC, in hope that mark would identify enough immediate
garbage to proceed with the cycle. (This would get better once we give the GC a stash of "reserved"
regions for evacuation!)

 d) It allows easier future handling of partial, traversal, and evac degeneration: we are already at
STW, and we can do whatever at that point.

Degenerate GC seems to improve the survivability on densely populated heaps. This could be modeled
roughly by having a normal heavily-allocating and heavily-threaded workload with a very tight heap.
Current gc+stats would tell that most allocation failures are handled by Degenerated GCs then:

-Xmx16g

[140.227s][info][gc,stats]   48 successful concurrent GCs
[140.227s][info][gc,stats]      0 invoked explicitly
[140.227s][info][gc,stats]
[140.227s][info][gc,stats]    2 Degenerated GCs
[140.227s][info][gc,stats]      2 caused by allocation failure
[140.227s][info][gc,stats]      0 upgraded to Full GC
[140.227s][info][gc,stats]
[140.227s][info][gc,stats]    0 Full GCs
[140.227s][info][gc,stats]      0 invoked explicitly
[140.227s][info][gc,stats]      0 caused by allocation failure
[140.227s][info][gc,stats]      0 upgraded from Degenerated GC

-Xmx2g

[197.491s][info][gc,stats]  379 successful concurrent GCs
[197.491s][info][gc,stats]      0 invoked explicitly
[197.491s][info][gc,stats]
[197.491s][info][gc,stats]  120 Degenerated GCs
[197.491s][info][gc,stats]    120 caused by allocation failure
[197.491s][info][gc,stats]     47 upgraded to Full GC
[197.491s][info][gc,stats]
[197.491s][info][gc,stats]   49 Full GCs
[197.491s][info][gc,stats]      0 invoked explicitly
[197.491s][info][gc,stats]      2 caused by allocation failure
[197.491s][info][gc,stats]     47 upgraded from Degenerated GC

(Full GC upgrades are from evac OOME-s, and alloc-failure Full GCs are the heuristics chickening out
from multiple back-to-back Degenerated GCs into Full GC).

Still fully testing it, but early reviews are welcome.

Testing: hotspot_gc_shenandoah, benchmarks

Thanks,
-Aleksey