RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph

Emanuel Peter epeter at openjdk.org
Sun Jun 16 18:53:42 UTC 2024


On Fri, 14 Jun 2024 10:34:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
> 
> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
> 
> We used to have:
> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
> 
> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
> 
> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
> 
> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
> 
> The new code has these steps:
> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
> - We then schedule the `VTransformGraph`, and check for cycles.
> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
> 
> **Testing**
> 
> Regression testing passed.
> 
> Performance testing running...

src/hotspot/share/opto/superword.cpp line 1878:

> 1876: //                                                         +--------+
> 1877: //
> 1878: class PacksetGraph {

Note: completely removed, replaced with `VTransformGraph`

src/hotspot/share/opto/superword.cpp line 1935:

> 1933: 
> 1934:   // Create nodes (from packs and scalar-nodes), and add edges, based on the dependency graph.
> 1935:   void build() {

Note: `build` is now done by `SuperWordVTransformBuilder::build_vtransform`.

src/hotspot/share/opto/superword.cpp line 1961:

> 1959:   //     in the Phi. Further, we replace uses of the old last store
> 1960:   //     with uses of the new last store (current_state).
> 1961:   GrowableArray<Node*> uses_after_loop;

Note: I wanted to add a `ResourceMark`. `Node_List` allocates from the `node_arena`, which I do not want. With `GrowableArray`, I can be sure that we allocate from the resource area.

src/hotspot/share/opto/superword.cpp line 2012:

> 2010:   // those to the memops_schedule. At the end, we return the memops_schedule, and
> 2011:   // note if topsort was successful.
> 2012:   Node_List schedule() {

Note: `schedule` here is done with topsort. But that is not great, because you need to edit the graph, i.e. remove edges. We now use a different algo in `VTransformGraph::schedule`.

src/hotspot/share/opto/superword.cpp line 2033:

> 2031:   }
> 2032: 
> 2033:   if (SuperWordLoopUnrollAnalysis) {

Note:
I don't yet fully understand this `SuperWordLoopUnrollAnalysis` code. Its purpose is to get super-unrolling. But I'm not sure it is correct in all cases - will investigate in a future RFE.

src/hotspot/share/opto/superword.cpp line 2133:

> 2131: }
> 2132: 
> 2133: bool SuperWord::apply(Node_List& memops_schedule) {

Note: renamed to `void VTransformGraph::apply()`

src/hotspot/share/opto/superword.cpp line 2224:

> 2222:   }
> 2223: 
> 2224:   return _packset.pack_input_at_index_or_null(u_pk, u_idx) != nullptr;

Note: I introduced this method for somewhere else, and decided to refactor here as well.

src/hotspot/share/opto/superword.cpp line 2291:

> 2289:         const TypePtr* atyp = n->adr_type();
> 2290:         vn = LoadVectorNode::make(opc, ctl, mem, adr, atyp, vlen, velt_basic_type(n), control_dependency(p));
> 2291:         vlen_in_bytes = vn->as_LoadVector()->memory_size();

Note:
Captured into `VTransformLoadVectorNode` by `SuperWordVTransformBuilder::make_vtnode_for_pack`. C2 nodes generated in `VTransformLoadVectorNode::apply`.

src/hotspot/share/opto/superword.cpp line 2306:

> 2304:         const TypePtr* atyp = n->adr_type();
> 2305:         vn = StoreVectorNode::make(opc, ctl, mem, adr, atyp, val, vlen);
> 2306:         vlen_in_bytes = vn->as_StoreVector()->memory_size();

Note:
Captured into `VTransformStoreVectorNode` by `SuperWordVTransformBuilder::make_vtnode_for_pack`. C2 nodes generated in `VTransformStoreVectorNode::apply`.

src/hotspot/share/opto/superword.cpp line 2334:

> 2332:         Node* one = vector_opd(p, 3);
> 2333:         vn = VectorNode::make(opc, in, zero, one, vlen, velt_basic_type(n));
> 2334:         vlen_in_bytes = vn->as_Vector()->length_in_bytes();

Note: all of the above 4 cases
Captured into `VTransformElementWiseVectorNode` by `SuperWordVTransformBuilder::make_vtnode_for_pack`. C2 nodes generated in `VTransformElementWiseVectorNode::apply`.

src/hotspot/share/opto/superword.cpp line 2376:

> 2374: 
> 2375:         // VectorBlend
> 2376:         vn = new VectorBlendNode(blend_in1, blend_in2, mask);

Note: `Cmp -> Bool -> CMove`
`Cmp` -> `VTransformElementWiseVectorNode` (but empty, i.e. no C2 node generated)
`Bool` -> `VTransformBoolVectorNode` (creates `VectorMaskCmpNode`)
`CMove` -> `VTransformElementWiseVectorNode` (creates `VectorBlendNode`)
Note: the `swap(blend_in1, blend_in2);` happens in `SuperWordVTransformBuilder::build_edges_for_vector_vtnodes`, where we look for the input `VTransformBoolVectorNode` and check if its test is negated. I'm not perfectly happy with this, but maybe I can improve this in the future with some kind of "IGVN" optimization on the `VTransformNodes`.

src/hotspot/share/opto/superword.cpp line 2418:

> 2416:           vn = VectorNode::make(opc, in1, in2, vlen, velt_basic_type(n));
> 2417:           vlen_in_bytes = vn->as_Vector()->length_in_bytes();
> 2418:         }

Note:
this is a sort of "catch all" case for the nodes with `req == 3`.

But I decided to spit away the reductions: `VTransformReductionVectorNode`.
All other cases are `VTransformElementWiseVectorNode`.

src/hotspot/share/opto/superword.cpp line 2445:

> 2443:         Node* in3 = vector_opd(p, 3);
> 2444:         vn = VectorNode::make(opc, in1, in2, in3, vlen, velt_basic_type(n));
> 2445:         vlen_in_bytes = vn->as_Vector()->length_in_bytes();

Note:
also these 4 cases are `VTransformElementWiseVectorNode`.

src/hotspot/share/opto/superword.cpp line 2466:

> 2464:           vn->as_StoreVector()->set_must_verify_alignment();
> 2465:         }
> 2466:       }

Note:
directly moved to `VTransformLoadVectorNode::apply` and `VTransformStoreVectorNode::apply`.

src/hotspot/share/opto/superword.cpp line 2518:

> 2516: //------------------------------vector_opd---------------------------
> 2517: // Create a vector operand for the nodes in pack p for operand: in(opd_idx)
> 2518: Node* SuperWord::vector_opd(Node_List* p, int opd_idx) {

Note:
This method used to find the vector/scalar inputs for packs. These inputs can have many forms:
- Already vectorized: `opd->is_Vector() || opd->is_LoadVector()`
- `PopulateIndexNode` pattern (not packed)
- `same_input`, i.e. most likely a `scalar2vector` aka `Replicate` pattern. Some nodes require such inputs to be specially manipulated, so that makes it a bit more complicated.

The analogue is now `SuperWordVTransformBuilder::get_vtnode_vector_input_at_index`.

src/hotspot/share/opto/superword.cpp line 3034:

> 3032: }
> 3033: 
> 3034: void SuperWordVTransformBuilder::build_vtransform() {

Note: used to be `PacksetGraph::build`. I also decided to refactor its content into smaller functions.

src/hotspot/share/opto/superword.cpp line 3064:

> 3062: }
> 3063: 
> 3064: void SuperWordVTransformBuilder::build_edges_for_vector_vtnodes(VectorSet& vtn_dependencies) {

Note: cases relate to old `SuperWord::apply_vectorization`.

src/hotspot/share/opto/superword.cpp line 3109:

> 3107: }
> 3108: 
> 3109: void SuperWordVTransformBuilder::build_edges_for_scalar_vtnodes(VectorSet& vtn_dependencies) {

Note: cases relate to old `SuperWord::apply_vectorization`.

src/hotspot/share/opto/superword.cpp line 3137:

> 3135: 
> 3136: // Create a vtnode for each pack. No in/out edges set yet.
> 3137: VTransformVectorNode* SuperWordVTransformBuilder::make_vtnode_for_pack(const Node_List* pack) const {

Note: cases relate to old `SuperWord::apply_vectorization`.

src/hotspot/share/opto/superword.cpp line 3181:

> 3179: }
> 3180: 
> 3181: VTransformNode* SuperWordVTransformBuilder::get_vtnode_vector_input_at_index(const Node_List* pack, const int index) {

Note: analogue from old `SuperWord::vector_opd`.

src/hotspot/share/opto/superword.cpp line 3262:

> 3260:   vtn = new (_graph.arena()) VTransformInputScalarNode(_graph, n);
> 3261:   set_vtnode(n, vtn);
> 3262:   return vtn;

Note: Since we want the loop-internal nodes to be able to reference all inputs as `vtnodes`, we must pack the inputs that are outside the loop also into special `VTransformInputScalarNode`s.

src/hotspot/share/opto/superword.cpp line 3292:

> 3290: 
> 3291: void SuperWordVTransformBuilder::add_dependencies_of_node_to_vtn(Node*n, VTransformNode* vtn, VectorSet& vtn_dependencies) {
> 3292:   for (VLoopDependencyGraph::PredsIterator preds(_vloop_analyzer.dependency_graph(), n); !preds.done(); preds.next()) {

Note: this `VLoopDependencyGraph` traversal used to be in `PacksetGraph::build`.

src/hotspot/share/opto/superword.hpp line 607:

> 605:   bool apply_vectorization();
> 606:   // Create a vector operand for the nodes in pack p for operand: in(opd_idx)
> 607:   Node* vector_opd(Node_List* p, int opd_idx);

Note: all now part of the `VTransformGraph`

src/hotspot/share/opto/superword.hpp line 634:

> 632: 
> 633: // Facility class that builds a VTransformGraph from a SuperWord PackSet.
> 634: class SuperWordVTransformBuilder : public StackObj {

Note: takes over work from old `PacksetGraph::build`

src/hotspot/share/opto/vectorization.cpp line 1806:

> 1804: //
> 1805: // We return "true" IFF we find no cycle, i.e. if the linearization succeeds.
> 1806: bool VTransformGraph::schedule() {

Note: used to be `PacksetGraph::schedule`. Now not with topsort, but with reverse-post-order. This works with the pre/post visited sets, like in other places of C2.

src/hotspot/share/opto/vectorization.cpp line 1960:

> 1958: }
> 1959: 
> 1960: VTransformApplyStatus VTransformElementWiseVectorNode::apply(const VLoopAnalyzer& vloop_analyzer, const GrowableArray<Node*>& vnode_idx_to_transformed_node) const {

Note: in the future, I could split up this `VTransformElementWiseVectorNode` into a number of sub-classes and thus avoid all the case distinctions below. But for now I keep it a bit closer to the old `SuperWord::apply_vectorization` with its case distinctions. There is also a lot of shared code between the cases, so I don't want to blow up the code size too much for now.

src/hotspot/share/opto/vectorization.hpp line 1449:

> 1447: 
> 1448:   bool schedule();
> 1449:   void apply();

Note: these are the public methods. In the future, we can add more, like `cost()` from the cost-model, or `optimize()` if we have optimizations, or maybe `if_convert()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639642610
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639643501
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639648367
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639645641
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639674802
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639646578
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639680961
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639659803
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639660753
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639664332
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639669354
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639671251
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639672090
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639672947
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639679509
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639683167
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639684299
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639684810
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639685049
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639685887
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639687792
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639689170
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639690922
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639691293
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639692423
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639694836
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1639696872


More information about the hotspot-compiler-dev mailing list