RFR: Traveral GC heuristics

Wed Jan 17 20:54:19 UTC 2018

Am 17.01.2018 um 18:10 schrieb Zhengyu Gu:
> shenandoahOopClosures.hpp:
>    Missing string dedup version

I am not sure what needs to be done for strdedup. Add support for it in 
a followup patch?

> shenandoahSupport.cpp
> L#615 - 656
> L#3537 - 3556
> L#3981 - 4056
>    indent

Fixed.

> sharedRuntime.cpp
> 
>   213   assert(oopDesc::is_oop(orig, true /* ignore mark word */), 
> "Error");
>   214   // store the original value that was in the field reference
>   215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
>   216 return;
>   217   thread->satb_mark_queue().enqueue(orig);
>   218 JRT_END
> 
> L#216: does not look right. Should it be inside UseShenandoahGC block?

It's not needed and can go away.

You'll find the updated patch in reply to Aleksey's review that I'll 
post shortly (after testing).

Thanks, Roman

> Thanks,
> 
> -Zhengyu
> 
> 
> On 01/17/2018 09:37 AM, Roman Kennke wrote:
>> Testing showed up some regressions in non-traversal code and two 
>> issues that I introduced (or haven't fixed) when single-flag patch 
>> arrived.
>>
>> The following now passes hotspot_gc_shenandoah tests and runs of 
>> specjvm with fastdebug with -XX:+ShenandoahVerify 
>> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
>>
>> Differential:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
>> Full:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/
>>
>> Please review, test, comment, etc. :-)
>>
>> Cheers, Roman
>>
>>> This started out as a smallish partial-GC experiment, then into a 
>>> clone of partial GC, and ended up as a standalone GC mode for 
>>> Shenandoah, which is a frankensteinization of 
>>> partial+concurrent-marking, with some goodies :-)
>>>
>>> The idea is to do everything, marking+evacuation+update-refs, in one 
>>> single phase. This is not very difficult to do: while traversing, 
>>> evacuate objects that are in the Cset, and update references as we 
>>> go. I chose to traverse the heap using an incremental-update 
>>> approach, mostly because this is what partial GC does, and as said 
>>> above, this started out as a clone of partial :-)
>>>
>>> The tricky part is to choose the Cset: I made it such that each GC 
>>> cycle collects liveness information, and bases the decision about 
>>> Cset in the next cycle on that liveness information. Yes, this means 
>>> the first cycle does not collect anything (except immediate garbage).
>>>
>>> Advantages:
>>> - obviously, touching all live objects only once means less time 
>>> spent in GC. Measurements show that traversing the heap and doing 
>>> everything is only slightly longer than Shenandoah's marking phase, 
>>> and this might actually be because we also need to mark through newly 
>>> allocated objects.
>>> - Traversal-order evacuation gives us 10x increase in 
>>> ordering-sensitive microbenchmark: 
>>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
>>>
>>> - Simpler barriers: i-u style barriers don't need to load the 
>>> pre-value, and can be optimized much better (hoisted out of hot 
>>> paths, etc). Some of it is already done in this patch, but there are 
>>> plenty of opportunities to make it even better.
>>> - Possibly less floating garbage because we trace through newly 
>>> allocated objects too, and don't treat it implicitely live.
>>> - we don't need a keep-alive-barrier for Reference.get() which means 
>>> we keep fewer referents alive just because they happen to be accessed 
>>> during GC.
>>> - MWF is only a switch away (if I understand MWF correctly): 
>>> -XX:+ShenandoahMWF
>>> - It does not need RBs in the WB fast-path, because outside of the 
>>> single phase, nothing is ever forwarded.
>>> - It does not need the membar stuff in the WBs because we turn on/off 
>>> the phase during safepoint
>>>
>>> Disadvantages:
>>> - Store-value barrier needs to be a WB, RB is not sufficient. The 
>>> storeval barrier is there to ensure only to-space values ever get 
>>> written to fields during update-refs. 3-phase Shenandoah doesn't 
>>> evacuate during update-refs, and therefore RB is enough. We need WB 
>>> here. (I believe this is off-set by optimization opportunities, see 
>>> above)
>>> - Known I-U problem: mutators can outrun the GC with allocations and 
>>> let us not terminate.
>>> - It needs barriers for constants (need to check this).
>>>
>>> Stuff left to do:
>>> - Implement sane degeneration: if we hit OOM, we simply restart and 
>>> go into full-GC.
>>> - Depending on degen: make heuristics adaptive. Currently it requires 
>>> manual tweaking of thresholds.
>>>
>>> Relevant knobs:
>>> - ShenandoahGarbageThreshold: regions with more garbage than this go 
>>> into the Cset. Notice that this is based on the *previous* cycle, so 
>>> we may actually have much more garbage (but not less).
>>> - ShenandoahFreeThreshold: start GC when we have less than that much 
>>> free heap.
>>>
>>> I'll not go into all the details for now and give you the code:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
>>>
>>>
>>> Roman
>>