RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post() [v3]

Tobias Hartmann thartmann at openjdk.java.net
Wed May 19 07:40:03 UTC 2021


On Mon, 17 May 2021 08:12:46 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> Sinking data nodes out of a loop when all uses are out of a loop has
>> several issues that this attempts to fix.
>> 
>> 1- Only non control uses are considered which makes little sense (why
>>   not sink if the data node is an argument to a call or a returned
>>   value?)
>> 
>> 2- Sinking of Loads is broken because of the handling of
>>   anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control
>>   in the loop because it takes all uses into account.
>> 
>> 3- For data nodes for which a control edge can't be set, commoning of
>>   clones back in the loop is prevented with:
>>   _igvn._worklist.yank(x);
>>   which gives no guarantee
>> 
>> This patch tries to address all issues:
>> 
>> 1- it looks at all uses, not only non control uses
>> 
>> 2- anti-dependences are computed for each use independently
>> 
>> 3- Cast nodes are used to pin clones out of loop
>> 
>> 
>> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl()
>> logic. While working on this, I noticed a bug in anti-dependence
>> analysis: when the use is a cfg node, the code sometimes looks at uses
>> of the memory state of the cfg. The logic uses the use of the cfg
>> which is a projection of adr_type identical to the cfg. It should
>> instead look at the use of the memory projection.
>> 
>> The existing logic for sinking loads calls clear_dom_lca_tags() for
>> every load which seems like quite a waste. I added a
>> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By
>> incrementing _dom_lca_tags_round, new tags that don't conflict with
>> existing ones are produced and there's no need for
>> clear_dom_lca_tags().
>> 
>> For anti-dependence analysis to return a correct result, early control
>> of the load is needed. The only way to get it at this stage, AFAICT,
>> is to compute it by following the load's input until a pinned node is
>> reached.
>> 
>> The existing logic pins cloned nodes next to their use. The logic I
>> propose pins them right out of the loop. This could possibly avoid
>> some redundant clones. It also makes some special handling for corner
>> cases with loop strip mining useless.
>> 
>> For 3-, I added extra Cast nodes for float types. If a chain of data
>> nodes are sunk, the new logic tries to keep a single Cast for the
>> entire chain rather than one Cast per node.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Tobias' review
>  - Merge branch 'master' into JDK-8252372
>  - CastVV
>  - Merge branch 'master' into JDK-8252372
>  - extra comments
>  - fix

Yes, I've linked it to the bug.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3689


More information about the hotspot-compiler-dev mailing list