RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph

Tue Jul 2 11:27:33 UTC 2024

On Fri, 14 Jun 2024 11:28:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>> 
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>> 
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>> 
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>> 
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>> 
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>> 
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>> 
>> **Testing**
>> 
>> Regression testing passed.
>> 
>> Performance testing: no significant change in performance (as expected).
>
> src/hotspot/share/opto/superword.cpp line 3181:
> 
>> 3179: }
>> 3180: 
>> 3181: VTransformNode* SuperWordVTransformBuilder::get_vtnode_vector_input_at_index(const Node_List* pack, const int index) {
> 
> Note: analogue from old `SuperWord::vector_opd`.

"get" suggests that we only get something that already exists but we are also creating new nodes. Can you add a comment here for clarification what the method is about?

> src/hotspot/share/opto/superword.cpp line 3262:
> 
>> 3260:   vtn = new (_graph.arena()) VTransformInputScalarNode(_graph, n);
>> 3261:   set_vtnode(n, vtn);
>> 3262:   return vtn;
> 
> Note: Since we want the loop-internal nodes to be able to reference all inputs as `vtnodes`, we must pack the inputs that are outside the loop also into special `VTransformInputScalarNode`s.

Can you add this explanation also to the `VTransformInputScalarNode` class?

> src/hotspot/share/opto/vectorization.hpp line 1449:
> 
>> 1447: 
>> 1448:   bool schedule();
>> 1449:   void apply();
> 
> Note: these are the public methods. In the future, we can add more, like `cost()` from the cost-model, or `optimize()` if we have optimizations, or maybe `if_convert()`.

"graph.apply()" seems quite generic. Could you find a more specific name? Maybe `transform_vtnodes()`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662272369
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662274683
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662194103