RFR: Traveral GC heuristics

Tue Jan 16 18:38:42 UTC 2018

This started out as a smallish partial-GC experiment, then into a clone 
of partial GC, and ended up as a standalone GC mode for Shenandoah, 
which is a frankensteinization of partial+concurrent-marking, with some 
goodies :-)

The idea is to do everything, marking+evacuation+update-refs, in one 
single phase. This is not very difficult to do: while traversing, 
evacuate objects that are in the Cset, and update references as we go. I 
chose to traverse the heap using an incremental-update approach, mostly 
because this is what partial GC does, and as said above, this started 
out as a clone of partial :-)

The tricky part is to choose the Cset: I made it such that each GC cycle 
collects liveness information, and bases the decision about Cset in the 
next cycle on that liveness information. Yes, this means the first cycle 
does not collect anything (except immediate garbage).

Advantages:
- obviously, touching all live objects only once means less time spent 
in GC. Measurements show that traversing the heap and doing everything 
is only slightly longer than Shenandoah's marking phase, and this might 
actually be because we also need to mark through newly allocated objects.
- Traversal-order evacuation gives us 10x increase in ordering-sensitive 
microbenchmark: https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/

- Simpler barriers: i-u style barriers don't need to load the pre-value, 
and can be optimized much better (hoisted out of hot paths, etc). Some 
of it is already done in this patch, but there are plenty of 
opportunities to make it even better.
- Possibly less floating garbage because we trace through newly 
allocated objects too, and don't treat it implicitely live.
- we don't need a keep-alive-barrier for Reference.get() which means we 
keep fewer referents alive just because they happen to be accessed 
during GC.
- MWF is only a switch away (if I understand MWF correctly): 
-XX:+ShenandoahMWF
- It does not need RBs in the WB fast-path, because outside of the 
single phase, nothing is ever forwarded.
- It does not need the membar stuff in the WBs because we turn on/off 
the phase during safepoint

Disadvantages:
- Store-value barrier needs to be a WB, RB is not sufficient. The 
storeval barrier is there to ensure only to-space values ever get 
written to fields during update-refs. 3-phase Shenandoah doesn't 
evacuate during update-refs, and therefore RB is enough. We need WB 
here. (I believe this is off-set by optimization opportunities, see above)
- Known I-U problem: mutators can outrun the GC with allocations and let 
us not terminate.
- It needs barriers for constants (need to check this).

Stuff left to do:
- Implement sane degeneration: if we hit OOM, we simply restart and go 
into full-GC.
- Depending on degen: make heuristics adaptive. Currently it requires 
manual tweaking of thresholds.

Relevant knobs:
- ShenandoahGarbageThreshold: regions with more garbage than this go 
into the Cset. Notice that this is based on the *previous* cycle, so we 
may actually have much more garbage (but not less).
- ShenandoahFreeThreshold: start GC when we have less than that much 
free heap.

I'll not go into all the details for now and give you the code:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/

Roman