RFR(XL): 8224675: Late GC barrier insertion for ZGC

Thu Jun 6 18:03:29 UTC 2019

On 2019-06-05 10:20, Roland Westrelin wrote:
>> Ah - I think I get it. You mean like this:
>>
>> void ZBarrierSetC2::barrier_insertion_phase(PhaseIterGVN& igvn)const {
>>     PhaseIdealLoop ideal_loop(igvn,LoopOptsNone);
>>
>>     // First make sure all loads between call and catch are moved to the
>> catch block clean_catch_blocks(&ideal_loop);
>>
>>     // Then expand barriers on all loads insert_load_barriers(&ideal_loop);
>>
>>     // Handle all Unsafe that need barriers. insert_barriers_on_unsafe(&ideal_loop);
>>
>>     // Cleanup any modified bits igvn.optimize();
>>
>>     igvn.C->clear_major_progress();
>> }
>>
>> An excellent idea. Then I can remove the new LoopOptsMode::BarrierInsertion.
> I was thinking something like what we do in Shenandoah for barrier
> expansion:
>
> PhaseIdealLoop ideal_loop(igvn, LoopOptsShenandoahExpand);
>
> ShenandoahBarrierSetC2::is_gc_specific_loop_opts_pass() returns true for
> LoopOptsShenandoahExpand and ShenandoahBarrierSetC2::optimize_loops()
> handles LoopOptsShenandoahExpand. So there's nothing shenandoah specific
> in PhaseIdealLoop::build_and_optimize().

I followed your example and changed my implementation inline with that.

>
>> I've been running some experiments with asserts on the clone code.
>>
>> 1) There can never be any control flow here - so now phis or such.
>>
>> 2) Stores have explicit control - and would never be scheduled here either.
>>
>> 3) Loads - they end up here because they can float. They only matter if
>> there is a use dominated by the catch (after a merge of catch control
>> flow), or uses in more than one catch-proj branch. The only nodes
>> observed being cloned is LoadPNodes with barriers, BoolNodes, and CmpP
>> nodes. It's the same pattern of comparing a pointer. All other load has
>> it's control in the catch-projs.
>>
>> I will add asserts to the clone in fixup_uses_in_catch to reflect this
>> conclusion and make sure that I catch any change in behavior.
> I was thinking of something like:
>
> try {
>    non_inlined_call1();
>    int v = some_object.object_field.int_field;
>    non_inlined_call2(v, v);
> } catch (..) {
>    int v = some_object.object_field.int_field;
>    // some use for v
> }
>
> So there would be a LoadP, a load barrier and a LoadI right after the
> call. The LoadI is the first to be cloned. It has 3 uses, so it's cloned
> 3 times? Which would mean non_inlined_call2 is actually called with:
>
>    SomeObject object = some_object.object_field;
>    non_inlined_call2(object.int_field, object.int_field);
>
> the field is reloaded and that code doesn't have the same effect as
> above. Or am I missing something?

The object_field usually gets a null-check which creates a control flow 
that anchors the LoadI further down. I managed to find one case in 
specjbb with a static call followed by a LoadL.

I fixed this by calling the call_catch_cleanup recursively on these 
load, when encountered. They will then be handled in the same way as the 
LoadPs with a barrier.

I also found an issue that I could get duplicate phis in blocks when 
connecting the loads together. I fixed this by keeping track on which 
regions has gotten a phi, and caching them.

I've also added a number of small fixes after feedback from Erik Ö.

New webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.06/

Regards,

Nils

>
> Roland.