RFR(XL): 8224675: Late GC barrier insertion for ZGC

Nils Eliasson nils.eliasson at oracle.com
Fri Jun 7 12:53:54 UTC 2019


http://cr.openjdk.java.net/~neliasso/8224675/webrev.07/


On 2019-06-07 14:45, Nils Eliasson wrote:
> Hi Erik,
>
> I uploaded webrev with the friend declaration removed.
>
> I also changed back the shape of how the barrier_insertion is called 
> from Compile.cpp. Then that will call PhaseIdealLoop::optimize and use 
> it in the same way as Shenandoah does GC specific loop opts.
>
> I also added a clearing of major progress after the barrier insertion. 
> Otherwise some LoopLimit nodes won't be removed correctly in the 
> following macro expansion phase.
>
> Thank you!
>
> Regards,
>
> / Nils
>
> On 2019-06-06 21:06, Erik Österlund wrote:
>> Hi Nils,
>>
>> One final nit... could you please remove the now seemingly 
>> unnecessary "friend class ZBarrierSetC2" in PhaseIdealLoop? I don't 
>> need another webrev for that.
>>
>> This looks good. Great job Nils. This will improve the robustness of 
>> our barriers.
>>
>> /Erik
>>
>> On 2019-06-06 20:03, Nils Eliasson wrote:
>>>
>>> On 2019-06-05 10:20, Roland Westrelin wrote:
>>>>> Ah - I think I get it. You mean like this:
>>>>>
>>>>> void ZBarrierSetC2::barrier_insertion_phase(PhaseIterGVN& 
>>>>> igvn)const {
>>>>>     PhaseIdealLoop ideal_loop(igvn,LoopOptsNone);
>>>>>
>>>>>     // First make sure all loads between call and catch are moved 
>>>>> to the
>>>>> catch block clean_catch_blocks(&ideal_loop);
>>>>>
>>>>>     // Then expand barriers on all loads 
>>>>> insert_load_barriers(&ideal_loop);
>>>>>
>>>>>     // Handle all Unsafe that need barriers. 
>>>>> insert_barriers_on_unsafe(&ideal_loop);
>>>>>
>>>>>     // Cleanup any modified bits igvn.optimize();
>>>>>
>>>>>     igvn.C->clear_major_progress();
>>>>> }
>>>>>
>>>>> An excellent idea. Then I can remove the new 
>>>>> LoopOptsMode::BarrierInsertion.
>>>> I was thinking something like what we do in Shenandoah for barrier
>>>> expansion:
>>>>
>>>> PhaseIdealLoop ideal_loop(igvn, LoopOptsShenandoahExpand);
>>>>
>>>> ShenandoahBarrierSetC2::is_gc_specific_loop_opts_pass() returns 
>>>> true for
>>>> LoopOptsShenandoahExpand and ShenandoahBarrierSetC2::optimize_loops()
>>>> handles LoopOptsShenandoahExpand. So there's nothing shenandoah 
>>>> specific
>>>> in PhaseIdealLoop::build_and_optimize().
>>>
>>> I followed your example and changed my implementation inline with that.
>>>
>>>
>>>>
>>>>> I've been running some experiments with asserts on the clone code.
>>>>>
>>>>> 1) There can never be any control flow here - so now phis or such.
>>>>>
>>>>> 2) Stores have explicit control - and would never be scheduled 
>>>>> here either.
>>>>>
>>>>> 3) Loads - they end up here because they can float. They only 
>>>>> matter if
>>>>> there is a use dominated by the catch (after a merge of catch control
>>>>> flow), or uses in more than one catch-proj branch. The only nodes
>>>>> observed being cloned is LoadPNodes with barriers, BoolNodes, and 
>>>>> CmpP
>>>>> nodes. It's the same pattern of comparing a pointer. All other 
>>>>> load has
>>>>> it's control in the catch-projs.
>>>>>
>>>>> I will add asserts to the clone in fixup_uses_in_catch to reflect 
>>>>> this
>>>>> conclusion and make sure that I catch any change in behavior.
>>>> I was thinking of something like:
>>>>
>>>> try {
>>>>    non_inlined_call1();
>>>>    int v = some_object.object_field.int_field;
>>>>    non_inlined_call2(v, v);
>>>> } catch (..) {
>>>>    int v = some_object.object_field.int_field;
>>>>    // some use for v
>>>> }
>>>>
>>>> So there would be a LoadP, a load barrier and a LoadI right after the
>>>> call. The LoadI is the first to be cloned. It has 3 uses, so it's 
>>>> cloned
>>>> 3 times? Which would mean non_inlined_call2 is actually called with:
>>>>
>>>>    SomeObject object = some_object.object_field;
>>>>    non_inlined_call2(object.int_field, object.int_field);
>>>>
>>>> the field is reloaded and that code doesn't have the same effect as
>>>> above. Or am I missing something?
>>>
>>> The object_field usually gets a null-check which creates a control 
>>> flow that anchors the LoadI further down. I managed to find one case 
>>> in specjbb with a static call followed by a LoadL.
>>>
>>> I fixed this by calling the call_catch_cleanup recursively on these 
>>> load, when encountered. They will then be handled in the same way as 
>>> the LoadPs with a barrier.
>>>
>>> I also found an issue that I could get duplicate phis in blocks when 
>>> connecting the loads together. I fixed this by keeping track on 
>>> which regions has gotten a phi, and caching them.
>>>
>>> I've also added a number of small fixes after feedback from Erik Ö.
>>>
>>> New webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.06/
>>>
>>> Regards,
>>>
>>> Nils
>>>
>>>
>>>>
>>>> Roland.


More information about the hotspot-compiler-dev mailing list