RFR: 8342662: C2: Add new phase for backend-specific lowering [v2]

Mon Oct 28 21:44:10 UTC 2024

On Mon, 28 Oct 2024 03:54:06 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> @jaskarth thanks for exploring platform-specific lowering!
>> 
>> I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design.
>> 
>> Currently, there are multiple places in the code where IR lowering happens. In particular:
>>   * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); 
>>   * macro expansion (Ideal -> Ideal);
>>   * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal);
>>   * final graph reshaping (Ideal -> Ideal);
>>   * matcher (Ideal -> Mach).
>> 
>> I'd like to understand how the new pass is intended to interact with existing cases.
>> 
>> Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`.
>> 
>> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well.
>> 
>> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side.
>> 
>> Some random observations:
>> * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check);
>
> Thanks a lot for your analysis of the patch, @iwanowww! I hope to answer some of your questions here.
> 
>> It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design.
> 
> I think this is a very fair point. I was testing some cases before I made the PR, but I wanted to submit just the system in isolation to make it easier to review. I can make some example use cases separately to show what could be possible with the new system.
> 
>> I'd like to understand how the new pass is intended to interact with existing cases.
> 
> The overarching goal is to support new kinds of transforms on ideal nodes that are only relevant to a single hardware platform, which would otherwise be too narrow in scope to put in shared code but would be difficult to do in purely AD code. It can be helpful having GVN while transforming the IR into a more backend-specific form. @merykitty added some nice examples above that illustrate possible use-cases.
> 
>> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But MacroLogicV case doesn't fit such model well.
> 
> The lowering implementation works similarly to how an `Ideal()` call works, so it's possible to do many->1 (like `MacroLogicV`) and many->many transformations as well.
> 
>> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes.
> 
> I was thinking if we're introducing nodes that only have functionality on specific platforms it might be nice to make those nodes only exist on those platforms as well, to reduce the size of shared code on platforms where the nodes aren't relevant. Since the lowering phase introduces new nodes that are specially known to the backend they should be supported by the backend too. However, it's not a necessary component of the lowering phase, just something that I thought could help with the implementation of lowered nodes.
> 
>> the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check);
> 
> This is true, my thought was since MacroLogicV currently also iterates across all live nodes doing it here as well would be alright. I think a way to collect lowering-spec...

@jaskarth

> I was testing some cases before I made the PR, but I wanted to submit just the system in isolation to make it easier to review. I can make some example use cases separately to show what could be possible with the new system.

Thanks. Primarily, I'm interested in how it fits existing use cases. For example, if somebody can port MacroLogicV and DivMod on top and publish them as dependent PRs, that would be very helpful to guide the discussion. 

> I was thinking if we're introducing nodes that only have functionality on specific platforms it might be nice to make those nodes only exist on those platforms as well, to reduce the size of shared code on platforms where the nodes aren't relevant.

It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). Also, total number of platform-specific Ideal nodes was low (especially, when compared to Mach nodes generated from AD files). So, keeping relevant code shared and guarding its usages with `Matcher::match_rule_supported()` seems appropriate.

> ... my thought was since MacroLogicV currently also iterates across all live nodes doing it here as well would be alright. 

MacroLogicV pass is guarded by `C->max_vector_size() > 0` and `Matcher::match_rule_supported(Op_MacroLogicV)` which (1) limits it to AVX512-capable hardware; and (2) ensures that some vector nodes were produced during compilation. It is a coarser-grained check than strictly required, but very effective at detecting when there are no optimization opportunities present.  

> I think a way to collect lowering-specific nodes would be difficult since the nodes that actually get lowered could change between backends.

It would definitely require a way to signal what Ideal nodes/IR patterns are interesting for a particular backend.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2442687335