RFR: 8342662: C2: Add new phase for backend-specific lowering [v6]

Thu Jan 9 06:36:51 UTC 2025

On Tue, 7 Jan 2025 22:47:58 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Implement apply_identity
>
> Were there any experiments conducted to port existing lowering transformations to the new pass? 
> 
> As we discussed before, there are multiple places in the code where lowering takes place. It is still not clear to me how much proposed solution unifies across existing use cases. What I'd really like to avoid is yet another peculiar way to perform lowering transformations in C2.

Hi @iwanowww, to share my thoughts on this, there are 2 places when we do lowering:

1. Macro transform:
- This place does lowering in a machine-independent manner. This makes it really awkward try to lower something that is highly dependent on the exact architecture. For example, we want to lower a `MulVL` with a constant into `AddVL`s and `LShiftVL`s. On Arm, long vector multiplication can be done pretty efficiently so we want to be conservative. However, on x86, long multiplication is multiple uops and has a massive latency. As a result, we want to be more aggressive in this transformation. Even worse, `vpmullq` is only available on AVX512, so for AVX2, we want to be even more aggressive, maybe even to the point of unconditionally doing the transformation.
- It still does machine-independent idealisation on all the nodes. This is the opposite of machine-dependent lowering purposes. Idealisation tries to simplify the graph so we can do analysis and transformation more easily, while lowering tries to complicate the graph so that the final code can get smaller. For example, let's consider an unsigned vector comparison. During idealisation, we want to keep it as is so that we have an easier time moving it around. However, if the machine does not support unsigned vector comparison, we want to break it down to `x + MIN_VALUE <=> y + MIN_VALUE`.

2. Matching:
- This place does not do GVN so we do not have much versatility here. Really this should only lower node in a one-to-one manner if we have `PhaseLowering` from before.
- Even worse, the matcher uses a custom grammar, which makes it awkward to work with. This leads to some confusing constructs such as `Matcher::pd_clone_node` and `Matcher::pd_clone_address_expressions`.

Furthermore, as it can be seen, there are several patches and to-do work that can benefit from this pass and have mentioned this PR. As a result, I think `PhaseLowering` is a beneficial and necessary addition.

Cheers,
Quan Anh

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2579276134