RFR: 8287087: C2: perform SLP reduction analysis on-demand [v2]

Roberto Castañeda Lozano rcastanedalo at openjdk.org
Fri Apr 14 12:47:44 UTC 2023


On Sun, 2 Apr 2023 05:52:17 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/share/opto/superword.cpp line 504:
>> 
>>> 502:       // to the phi node following edge index 'input'.
>>> 503:       PathEnd path =
>>> 504:         find_in_path(
>> 
>> Hi @robcasloz,
>> find_in_path expects reduction nodes to be present at same edge indices in the reduction chain, it also honors has_swapped_edge flag during backward traversal.
>> However, there are still some ideal transforms like following which may break the reduction chain and this will prevent Min/Max reductions for test case mentioned in JDK-8302673.
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L1147
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L1230
>
> One way to add fault-tolerance to find_in_path could be to follow strict DFS semantics where an alternate path is taken if node's predicates are not satisfied, currently we are starting all over again from the first node of chain with a different reduction_input which prevents inferring reduction chain even though all the nodes in the chain are commutative isomorphic operations.

Hi again @jatin-bhateja,

I have studied now [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) in more detail, and my conclusion is that it is not a duplicate but actually orthogonal to this changeset. Even a perfect reduction analysis alone would not re-enable the missing vectorization, because the canonicalization transformations done by `MaxI/MinINode::Ideal()` inhibit SuperWord analysis at a later stage.

In light of this, I propose to re-open [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673), and address it by handling all four combinations of two-level inputs in `MaxI/MinINode::Ideal()` instead of canonicalizing `MaxI/MinI` chains. This solution is, in my opinion, more straightforward (and not necessarily more expensive). The main reason is that it separates concerns, making it possible to reason about input swapping, optimization of `MaxI/MinI` nodes, reduction analysis, and auto-vectorization separately. I have [a WIP, prototype implementation](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-8302673) which seems to work fine for all of the discussed reduction analysis strategies. @jatin-bhateja, if you want I can take over JDK-8302673 and submit it for review once I have polished it.

Given that JDK-8302673 is orthogonal to this RFE, and that a solution to JDK-8302673 is available for which this RFE detects MaxI/MinI reductions correctly, I suggest to move on with this RFE as-is, and file a follow-up RFE to investigate generic search approaches.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13120#discussion_r1166778814


More information about the hotspot-compiler-dev mailing list