RFR: 8342662: C2: Add new phase for backend-specific lowering [v6]

Wed Jan 15 02:32:01 UTC 2025

On Wed, 15 Jan 2025 01:55:26 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> `lower_node_transform` transforms a node that should not appear in matching to something that can appear there while `LoweredIdeal` transforms a node that may appear in matching to another based on the pattern of its input.
>> 
>> For example, consider this Java code:
>> 
>>     Int256Vector v1;
>>     Int256Vector v2 = v1.withLane(4, x);
>>     Int256Vector v3 = v2.withLane(5, y);
>> 
>> Before lowering we would have (pseudocode for the graph):
>> 
>>     vector<int,8> v1;
>>     vector<int,8> v2 = VectorInsert(v1, x, 4);
>>     vector<int,8> v3 = VectorInsert(v2, y, 5);
>> 
>> x86 does not know how to insert to a 256-bit vector, so we need to extract the 128-bit lane, insert the element into the lane, then insert the lane into the original vector. Currently, this is done during code emission, suppose we want to do so during lowering, we will have this:
>> 
>>     vector<int,8> v1; // [a, b, c, d, e, f, g, h]
>>     vector<int,4> v4 = ExtractVector(v1, 1); // [e, f, g, h]
>>     vector<int,4> v5 = VectorInsert(v4, x, 0); // [x, f, g, h]
>>     vector<int,8> v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h]
>>     vector<int,4> v6 = ExtractVector(v2, 1); // [x, f, g, h]
>>     vector<int,4> v7 = VectorInsert(v6, y, 1); // [x, y, g, h]
>>     vector<int,8> v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h]
>> 
>> Now using `Identity` we may be able to ensure that `v6 == v5`, this leaves us with:
>> 
>>     vector<int,8> v1; // [a, b, c, d, e, f, g, h]
>>     vector<int,4> v4 = ExtractVector(v1, 1); // [e, f, g, h]
>>     vector<int,4> v5 = VectorInsert(v4, x, 0); // [x, f, g, h]
>>     vector<int,8> v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h]
>>     vector<int,4> v7 = VectorInsert(v5, y, 1); // [x, y, g, h]
>>     vector<int,8> v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h]
>> 
>> Ideally, we would want to transform `v3` into `VectorInsert(v1, v7, 1)` because then we can elide `v2`. This can be done using `LoweredIdeal`.
>> 
>> So to your question, I think `LoweredIdeal` would be a better choice, this aligns pretty well with our current method of doing it in `Ideal`, too.
>
> Thanks for the reply!
> 
>> lower_node_transform transforms a node that should not appear in matching to something that can appear there while LoweredIdeal transforms a node that may appear in matching to another based on the pattern of its input.
> 
> So `lower_node_platform` may do lowering for nodes like macro ones, right? If so, adding `macro` keyword to the function is better to me.  Another question for me: why does it need such separation? For macro nodes, if `LoweredIdeal` cannot match the lowering requirement, we can lowering them during macro expansion, right? 
> 
>> So to your question, I think LoweredIdeal would be a better choice, this aligns pretty well with our current method of doing it in Ideal, too.
> 
> This makes sense to me and is also what I expected. But consider `LoweredIdeal` is a virtual method on each IR node, how can we do architecture specific transformation in it, except for adding the arch specific hook in matcher? Or do you have a REF change for above `VectorInsert` in `LoweredIdeal` ? Thanks a lot!

I would advise against using the term macro to avoid confusion with PhaseMacroExpansion.

Regarding the separation, your point about the architecture dependency is the reason why we need a separate lower_node_transform. Suppose you want to lower a machine-independent node on x86. With lower_node_transform we would only need to add a case to the function, while with LoweredIdeal we would need to implement it for all architectures. Note that we can lower a machine-independent node to a machine-dependent node, for example let's imagine a node X86VectorRearrangeConst. So we can write machine-dependent transformation without using machine-dependent code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1915864151