RFR: 8342662: C2: Add new phase for backend-specific lowering [v6]
Quan Anh Mai
qamai at openjdk.org
Thu Jan 9 17:38:42 UTC 2025
On Thu, 9 Jan 2025 07:27:12 GMT, erifan <duke at openjdk.org> wrote:
>> Hi @iwanowww, to share my thoughts on this, there are 2 places when we do lowering:
>>
>> 1. Macro transform:
>> - This place does lowering in a machine-independent manner. This makes it really awkward try to lower something that is highly dependent on the exact architecture. For example, we want to lower a `MulVL` with a constant into `AddVL`s and `LShiftVL`s. On Arm, long vector multiplication can be done pretty efficiently so we want to be conservative. However, on x86, long multiplication is multiple uops and has a massive latency. As a result, we want to be more aggressive in this transformation. Even worse, `vpmullq` is only available on AVX512, so for AVX2, we want to be even more aggressive, maybe even to the point of unconditionally doing the transformation.
>> - It still does machine-independent idealisation on all the nodes. This is the opposite of machine-dependent lowering purposes. Idealisation tries to simplify the graph so we can do analysis and transformation more easily, while lowering tries to complicate the graph so that the final code can get smaller. For example, let's consider an unsigned vector comparison. During idealisation, we want to keep it as is so that we have an easier time moving it around. However, if the machine does not support unsigned vector comparison, we want to break it down to `x + MIN_VALUE <=> y + MIN_VALUE`.
>>
>> 2. Matching:
>> - This place does not do GVN so we do not have much versatility here. Really this should only lower node in a one-to-one manner if we have `PhaseLowering` from before.
>> - Even worse, the matcher uses a custom grammar, which makes it awkward to work with. This leads to some confusing constructs such as `Matcher::pd_clone_node` and `Matcher::pd_clone_address_expressions`.
>>
>> Furthermore, as it can be seen, there are several patches and to-do work that can benefit from this pass and have mentioned this PR. As a result, I think `PhaseLowering` is a beneficial and necessary addition.
>>
>> Cheers,
>> Quan Anh
>
> Hi @merykitty I noticed you mentioned the optimization of vector multiplication to shift add. Since I am working on this recently, in order to avoid duplication of work, I'd like to ask if you have any plans to do this?
@erifan No I mentioned it because I read your PR :)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2580894226
More information about the hotspot-compiler-dev
mailing list