RFR: 8342662: C2: Add new phase for backend-specific lowering [v3]

Wed Oct 30 04:51:09 UTC 2024

On Wed, 30 Oct 2024 04:43:55 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Hi all,
>> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism.
>> 
>> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated!
>
> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into phase-lowering
>  - Remove platform-dependent node definitions, rework PhaseLowering implementation
>  - Address some changes from code review
>  - Implement PhaseLowering

Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments:

> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes).

This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset.

> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well.

The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching.

> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well.

My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case that we don't find a relevant `ModNode` to combine into a `DivMod`. So with that limitation we wouldn't be able to move it to this pass.

With regard to scalability, since lowering is the last step before final graph reshaping now, I think the only places where lowered nodes could interact with existing code is in final graph reshaping and final cleanup before matching. With final graph reshaping I think the impact is minimal, as any changes that would need to be done there could be done during lowering itself. During final cleanup before matching, we convert nodes with more than 2 inputs into a binary tree form for the platform matcher to consume. In this place it would be possible to process lowered nodes, but with lowered nodes being defined in shared code now this can be done purely in shared code still.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2445839113