RFR: Traveral GC heuristics

Wed Jan 17 14:37:55 UTC 2018

Testing showed up some regressions in non-traversal code and two issues 
that I introduced (or haven't fixed) when single-flag patch arrived.

The following now passes hotspot_gc_shenandoah tests and runs of specjvm 
with fastdebug with -XX:+ShenandoahVerify 
-XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4

Differential:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/

Please review, test, comment, etc. :-)

Cheers, Roman

> This started out as a smallish partial-GC experiment, then into a clone 
> of partial GC, and ended up as a standalone GC mode for Shenandoah, 
> which is a frankensteinization of partial+concurrent-marking, with some 
> goodies :-)
> 
> The idea is to do everything, marking+evacuation+update-refs, in one 
> single phase. This is not very difficult to do: while traversing, 
> evacuate objects that are in the Cset, and update references as we go. I 
> chose to traverse the heap using an incremental-update approach, mostly 
> because this is what partial GC does, and as said above, this started 
> out as a clone of partial :-)
> 
> The tricky part is to choose the Cset: I made it such that each GC cycle 
> collects liveness information, and bases the decision about Cset in the 
> next cycle on that liveness information. Yes, this means the first cycle 
> does not collect anything (except immediate garbage).
> 
> Advantages:
> - obviously, touching all live objects only once means less time spent 
> in GC. Measurements show that traversing the heap and doing everything 
> is only slightly longer than Shenandoah's marking phase, and this might 
> actually be because we also need to mark through newly allocated objects.
> - Traversal-order evacuation gives us 10x increase in ordering-sensitive 
> microbenchmark: 
> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
> 
> - Simpler barriers: i-u style barriers don't need to load the pre-value, 
> and can be optimized much better (hoisted out of hot paths, etc). Some 
> of it is already done in this patch, but there are plenty of 
> opportunities to make it even better.
> - Possibly less floating garbage because we trace through newly 
> allocated objects too, and don't treat it implicitely live.
> - we don't need a keep-alive-barrier for Reference.get() which means we 
> keep fewer referents alive just because they happen to be accessed 
> during GC.
> - MWF is only a switch away (if I understand MWF correctly): 
> -XX:+ShenandoahMWF
> - It does not need RBs in the WB fast-path, because outside of the 
> single phase, nothing is ever forwarded.
> - It does not need the membar stuff in the WBs because we turn on/off 
> the phase during safepoint
> 
> Disadvantages:
> - Store-value barrier needs to be a WB, RB is not sufficient. The 
> storeval barrier is there to ensure only to-space values ever get 
> written to fields during update-refs. 3-phase Shenandoah doesn't 
> evacuate during update-refs, and therefore RB is enough. We need WB 
> here. (I believe this is off-set by optimization opportunities, see above)
> - Known I-U problem: mutators can outrun the GC with allocations and let 
> us not terminate.
> - It needs barriers for constants (need to check this).
> 
> Stuff left to do:
> - Implement sane degeneration: if we hit OOM, we simply restart and go 
> into full-GC.
> - Depending on degen: make heuristics adaptive. Currently it requires 
> manual tweaking of thresholds.
> 
> Relevant knobs:
> - ShenandoahGarbageThreshold: regions with more garbage than this go 
> into the Cset. Notice that this is based on the *previous* cycle, so we 
> may actually have much more garbage (but not less).
> - ShenandoahFreeThreshold: start GC when we have less than that much 
> free heap.
> 
> I'll not go into all the details for now and give you the code:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
> 
> 
> Roman