RFR(XL): 8224675: Late GC barrier insertion for ZGC

Fri Jun 7 14:27:40 UTC 2019

Looks good.

/Erik

> On 7 Jun 2019, at 14:53, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> 
> http://cr.openjdk.java.net/~neliasso/8224675/webrev.07/
> 
> 
>> On 2019-06-07 14:45, Nils Eliasson wrote:
>> Hi Erik,
>> 
>> I uploaded webrev with the friend declaration removed.
>> 
>> I also changed back the shape of how the barrier_insertion is called from Compile.cpp. Then that will call PhaseIdealLoop::optimize and use it in the same way as Shenandoah does GC specific loop opts.
>> 
>> I also added a clearing of major progress after the barrier insertion. Otherwise some LoopLimit nodes won't be removed correctly in the following macro expansion phase.
>> 
>> Thank you!
>> 
>> Regards,
>> 
>> / Nils
>> 
>>> On 2019-06-06 21:06, Erik Österlund wrote:
>>> Hi Nils,
>>> 
>>> One final nit... could you please remove the now seemingly unnecessary "friend class ZBarrierSetC2" in PhaseIdealLoop? I don't need another webrev for that.
>>> 
>>> This looks good. Great job Nils. This will improve the robustness of our barriers.
>>> 
>>> /Erik
>>> 
>>>> On 2019-06-06 20:03, Nils Eliasson wrote:
>>>> 
>>>> On 2019-06-05 10:20, Roland Westrelin wrote:
>>>>>> Ah - I think I get it. You mean like this:
>>>>>> 
>>>>>> void ZBarrierSetC2::barrier_insertion_phase(PhaseIterGVN& igvn)const {
>>>>>>     PhaseIdealLoop ideal_loop(igvn,LoopOptsNone);
>>>>>> 
>>>>>>     // First make sure all loads between call and catch are moved to the
>>>>>> catch block clean_catch_blocks(&ideal_loop);
>>>>>> 
>>>>>>     // Then expand barriers on all loads insert_load_barriers(&ideal_loop);
>>>>>> 
>>>>>>     // Handle all Unsafe that need barriers. insert_barriers_on_unsafe(&ideal_loop);
>>>>>> 
>>>>>>     // Cleanup any modified bits igvn.optimize();
>>>>>> 
>>>>>>     igvn.C->clear_major_progress();
>>>>>> }
>>>>>> 
>>>>>> An excellent idea. Then I can remove the new LoopOptsMode::BarrierInsertion.
>>>>> I was thinking something like what we do in Shenandoah for barrier
>>>>> expansion:
>>>>> 
>>>>> PhaseIdealLoop ideal_loop(igvn, LoopOptsShenandoahExpand);
>>>>> 
>>>>> ShenandoahBarrierSetC2::is_gc_specific_loop_opts_pass() returns true for
>>>>> LoopOptsShenandoahExpand and ShenandoahBarrierSetC2::optimize_loops()
>>>>> handles LoopOptsShenandoahExpand. So there's nothing shenandoah specific
>>>>> in PhaseIdealLoop::build_and_optimize().
>>>> 
>>>> I followed your example and changed my implementation inline with that.
>>>> 
>>>> 
>>>>> 
>>>>>> I've been running some experiments with asserts on the clone code.
>>>>>> 
>>>>>> 1) There can never be any control flow here - so now phis or such.
>>>>>> 
>>>>>> 2) Stores have explicit control - and would never be scheduled here either.
>>>>>> 
>>>>>> 3) Loads - they end up here because they can float. They only matter if
>>>>>> there is a use dominated by the catch (after a merge of catch control
>>>>>> flow), or uses in more than one catch-proj branch. The only nodes
>>>>>> observed being cloned is LoadPNodes with barriers, BoolNodes, and CmpP
>>>>>> nodes. It's the same pattern of comparing a pointer. All other load has
>>>>>> it's control in the catch-projs.
>>>>>> 
>>>>>> I will add asserts to the clone in fixup_uses_in_catch to reflect this
>>>>>> conclusion and make sure that I catch any change in behavior.
>>>>> I was thinking of something like:
>>>>> 
>>>>> try {
>>>>>    non_inlined_call1();
>>>>>    int v = some_object.object_field.int_field;
>>>>>    non_inlined_call2(v, v);
>>>>> } catch (..) {
>>>>>    int v = some_object.object_field.int_field;
>>>>>    // some use for v
>>>>> }
>>>>> 
>>>>> So there would be a LoadP, a load barrier and a LoadI right after the
>>>>> call. The LoadI is the first to be cloned. It has 3 uses, so it's cloned
>>>>> 3 times? Which would mean non_inlined_call2 is actually called with:
>>>>> 
>>>>>    SomeObject object = some_object.object_field;
>>>>>    non_inlined_call2(object.int_field, object.int_field);
>>>>> 
>>>>> the field is reloaded and that code doesn't have the same effect as
>>>>> above. Or am I missing something?
>>>> 
>>>> The object_field usually gets a null-check which creates a control flow that anchors the LoadI further down. I managed to find one case in specjbb with a static call followed by a LoadL.
>>>> 
>>>> I fixed this by calling the call_catch_cleanup recursively on these load, when encountered. They will then be handled in the same way as the LoadPs with a barrier.
>>>> 
>>>> I also found an issue that I could get duplicate phis in blocks when connecting the loads together. I fixed this by keeping track on which regions has gotten a phi, and caching them.
>>>> 
>>>> I've also added a number of small fixes after feedback from Erik Ö.
>>>> 
>>>> New webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.06/
>>>> 
>>>> Regards,
>>>> 
>>>> Nils
>>>> 
>>>> 
>>>>> 
>>>>> Roland.