RFR: 8342662: C2: Add new phase for backend-specific lowering [v6]

Wed Jan 15 01:57:41 UTC 2025

On Tue, 14 Jan 2025 11:02:33 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> src/hotspot/share/opto/phaseX.cpp line 2301:
>> 
>>> 2299:   // that may undo the changes done during lowering.
>>> 2300: 
>>> 2301:   return k->LoweredIdeal(this);
>> 
>> I'm sorry that I still cannot understand well what this method is expected to do for a node. For example, if we need to add some architecture specific optimization for `MulNode` like AArch64, we can add the lowering code in `lower_node_platform` for AArch64, right? Do we also need to override the `LoweredIdeal()` for `MulNode` ? Thanks!
>
> `lower_node_transform` transforms a node that should not appear in matching to something that can appear there while `LoweredIdeal` transforms a node that may appear in matching to another based on the pattern of its input.
> 
> For example, consider this Java code:
> 
>     Int256Vector v1;
>     Int256Vector v2 = v1.withLane(4, x);
>     Int256Vector v3 = v2.withLane(5, y);
> 
> Before lowering we would have (pseudocode for the graph):
> 
>     vector<int,8> v1;
>     vector<int,8> v2 = VectorInsert(v1, x, 4);
>     vector<int,8> v3 = VectorInsert(v2, y, 5);
> 
> x86 does not know how to insert to a 256-bit vector, so we need to extract the 128-bit lane, insert the element into the lane, then insert the lane into the original vector. Currently, this is done during code emission, suppose we want to do so during lowering, we will have this:
> 
>     vector<int,8> v1; // [a, b, c, d, e, f, g, h]
>     vector<int,4> v4 = ExtractVector(v1, 1); // [e, f, g, h]
>     vector<int,4> v5 = VectorInsert(v4, x, 0); // [x, f, g, h]
>     vector<int,8> v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h]
>     vector<int,4> v6 = ExtractVector(v2, 1); // [x, f, g, h]
>     vector<int,4> v7 = VectorInsert(v6, y, 1); // [x, y, g, h]
>     vector<int,8> v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h]
> 
> Now using `Identity` we may be able to ensure that `v6 == v5`, this leaves us with:
> 
>     vector<int,8> v1; // [a, b, c, d, e, f, g, h]
>     vector<int,4> v4 = ExtractVector(v1, 1); // [e, f, g, h]
>     vector<int,4> v5 = VectorInsert(v4, x, 0); // [x, f, g, h]
>     vector<int,8> v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h]
>     vector<int,4> v7 = VectorInsert(v5, y, 1); // [x, y, g, h]
>     vector<int,8> v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h]
> 
> Ideally, we would want to transform `v3` into `VectorInsert(v1, v7, 1)` because then we can elide `v2`. This can be done using `LoweredIdeal`.
> 
> So to your question, I think `LoweredIdeal` would be a better choice, this aligns pretty well with our current method of doing it in `Ideal`, too.

Thanks for the reply!

> lower_node_transform transforms a node that should not appear in matching to something that can appear there while LoweredIdeal transforms a node that may appear in matching to another based on the pattern of its input.

So `lower_node_platform` may do lowering for nodes like macro ones, right? If so, adding `macro` keyword to the function is better to me.  Another question for me: why does it need such separation? For macro nodes, if `LoweredIdeal` cannot match the lowering requirement, we can lowering them during macro expansion, right? 

> So to your question, I think LoweredIdeal would be a better choice, this aligns pretty well with our current method of doing it in Ideal, too.

This makes sense to me and is also what I expected. But consider `LoweredIdeal` is a virtual method on each IR node, how can we do architecture specific transformation in it, except for adding the arch specific hook in matcher? Or do you have a REF change for above `VectorInsert` in `LoweredIdeal` ? Thanks a lot!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1915845442