RFR: Traveral GC heuristics

Wed Jan 17 20:56:45 UTC 2018

On 01/17/2018 03:54 PM, Roman Kennke wrote:
> Am 17.01.2018 um 18:10 schrieb Zhengyu Gu:
>> shenandoahOopClosures.hpp:
>>    Missing string dedup version
> 
> I am not sure what needs to be done for strdedup. Add support for it in 
> a followup patch?

Sure. I can add the support afterward.

Thanks,

-Zhengyu

> 
>> shenandoahSupport.cpp
>> L#615 - 656
>> L#3537 - 3556
>> L#3981 - 4056
>>    indent
> 
> Fixed.
> 
>> sharedRuntime.cpp
>>
>>   213   assert(oopDesc::is_oop(orig, true /* ignore mark word */), 
>> "Error");
>>   214   // store the original value that was in the field reference
>>   215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
>>   216 return;
>>   217   thread->satb_mark_queue().enqueue(orig);
>>   218 JRT_END
>>
>> L#216: does not look right. Should it be inside UseShenandoahGC block?
> 
> It's not needed and can go away.
> 
> You'll find the updated patch in reply to Aleksey's review that I'll 
> post shortly (after testing).
> 
> Thanks, Roman
> 
>> Thanks,
>>
>> -Zhengyu
>>
>>
>> On 01/17/2018 09:37 AM, Roman Kennke wrote:
>>> Testing showed up some regressions in non-traversal code and two 
>>> issues that I introduced (or haven't fixed) when single-flag patch 
>>> arrived.
>>>
>>> The following now passes hotspot_gc_shenandoah tests and runs of 
>>> specjvm with fastdebug with -XX:+ShenandoahVerify 
>>> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
>>>
>>> Differential:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
>>> Full:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/
>>>
>>> Please review, test, comment, etc. :-)
>>>
>>> Cheers, Roman
>>>
>>>> This started out as a smallish partial-GC experiment, then into a 
>>>> clone of partial GC, and ended up as a standalone GC mode for 
>>>> Shenandoah, which is a frankensteinization of 
>>>> partial+concurrent-marking, with some goodies :-)
>>>>
>>>> The idea is to do everything, marking+evacuation+update-refs, in one 
>>>> single phase. This is not very difficult to do: while traversing, 
>>>> evacuate objects that are in the Cset, and update references as we 
>>>> go. I chose to traverse the heap using an incremental-update 
>>>> approach, mostly because this is what partial GC does, and as said 
>>>> above, this started out as a clone of partial :-)
>>>>
>>>> The tricky part is to choose the Cset: I made it such that each GC 
>>>> cycle collects liveness information, and bases the decision about 
>>>> Cset in the next cycle on that liveness information. Yes, this means 
>>>> the first cycle does not collect anything (except immediate garbage).
>>>>
>>>> Advantages:
>>>> - obviously, touching all live objects only once means less time 
>>>> spent in GC. Measurements show that traversing the heap and doing 
>>>> everything is only slightly longer than Shenandoah's marking phase, 
>>>> and this might actually be because we also need to mark through 
>>>> newly allocated objects.
>>>> - Traversal-order evacuation gives us 10x increase in 
>>>> ordering-sensitive microbenchmark: 
>>>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
>>>>
>>>> - Simpler barriers: i-u style barriers don't need to load the 
>>>> pre-value, and can be optimized much better (hoisted out of hot 
>>>> paths, etc). Some of it is already done in this patch, but there are 
>>>> plenty of opportunities to make it even better.
>>>> - Possibly less floating garbage because we trace through newly 
>>>> allocated objects too, and don't treat it implicitely live.
>>>> - we don't need a keep-alive-barrier for Reference.get() which means 
>>>> we keep fewer referents alive just because they happen to be 
>>>> accessed during GC.
>>>> - MWF is only a switch away (if I understand MWF correctly): 
>>>> -XX:+ShenandoahMWF
>>>> - It does not need RBs in the WB fast-path, because outside of the 
>>>> single phase, nothing is ever forwarded.
>>>> - It does not need the membar stuff in the WBs because we turn 
>>>> on/off the phase during safepoint
>>>>
>>>> Disadvantages:
>>>> - Store-value barrier needs to be a WB, RB is not sufficient. The 
>>>> storeval barrier is there to ensure only to-space values ever get 
>>>> written to fields during update-refs. 3-phase Shenandoah doesn't 
>>>> evacuate during update-refs, and therefore RB is enough. We need WB 
>>>> here. (I believe this is off-set by optimization opportunities, see 
>>>> above)
>>>> - Known I-U problem: mutators can outrun the GC with allocations and 
>>>> let us not terminate.
>>>> - It needs barriers for constants (need to check this).
>>>>
>>>> Stuff left to do:
>>>> - Implement sane degeneration: if we hit OOM, we simply restart and 
>>>> go into full-GC.
>>>> - Depending on degen: make heuristics adaptive. Currently it 
>>>> requires manual tweaking of thresholds.
>>>>
>>>> Relevant knobs:
>>>> - ShenandoahGarbageThreshold: regions with more garbage than this go 
>>>> into the Cset. Notice that this is based on the *previous* cycle, so 
>>>> we may actually have much more garbage (but not less).
>>>> - ShenandoahFreeThreshold: start GC when we have less than that much 
>>>> free heap.
>>>>
>>>> I'll not go into all the details for now and give you the code:
>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
>>>>
>>>>
>>>> Roman
>>>
>