RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph
Christian Hagedorn
chagedorn at openjdk.org
Tue Jul 2 11:27:33 UTC 2024
On Fri, 14 Jun 2024 11:28:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>>
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>>
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>>
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>>
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>>
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>>
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>>
>> **Testing**
>>
>> Regression testing passed.
>>
>> Performance testing: no significant change in performance (as expected).
>
> src/hotspot/share/opto/superword.cpp line 3181:
>
>> 3179: }
>> 3180:
>> 3181: VTransformNode* SuperWordVTransformBuilder::get_vtnode_vector_input_at_index(const Node_List* pack, const int index) {
>
> Note: analogue from old `SuperWord::vector_opd`.
"get" suggests that we only get something that already exists but we are also creating new nodes. Can you add a comment here for clarification what the method is about?
> src/hotspot/share/opto/superword.cpp line 3262:
>
>> 3260: vtn = new (_graph.arena()) VTransformInputScalarNode(_graph, n);
>> 3261: set_vtnode(n, vtn);
>> 3262: return vtn;
>
> Note: Since we want the loop-internal nodes to be able to reference all inputs as `vtnodes`, we must pack the inputs that are outside the loop also into special `VTransformInputScalarNode`s.
Can you add this explanation also to the `VTransformInputScalarNode` class?
> src/hotspot/share/opto/vectorization.hpp line 1449:
>
>> 1447:
>> 1448: bool schedule();
>> 1449: void apply();
>
> Note: these are the public methods. In the future, we can add more, like `cost()` from the cost-model, or `optimize()` if we have optimizations, or maybe `if_convert()`.
"graph.apply()" seems quite generic. Could you find a more specific name? Maybe `transform_vtnodes()`?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662272369
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662274683
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662194103
More information about the hotspot-compiler-dev
mailing list