RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph

Tue Jul 2 11:27:31 UTC 2024

On Fri, 14 Jun 2024 10:34:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
> 
> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
> 
> We used to have:
> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
> 
> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
> 
> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
> 
> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
> 
> The new code has these steps:
> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
> - We then schedule the `VTransformGraph`, and check for cycles.
> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
> 
> **Testing**
> 
> Regression testing passed.
> 
> Performance testing: no significant change in performance (as expected).

Nice refactoring. It was a little bit difficult to review. But I think it overall looks good. 

Here are some comments about a first pass. Feel free to also do some of the suggestions separately if you agree with them since the PR is already quite big.

src/hotspot/share/opto/superword.cpp line 1883:

> 1881: // Apply the vectorization, i.e. we irreversibly edit the C2 graph. At this point, all
> 1882: // correctness and profitability checks have passed, and the graph was successfully scheduled.
> 1883: void VTransformGraph::apply() {

Maybe also assert here that the schedule is not empty?

src/hotspot/share/opto/superword.cpp line 1896:

> 1894:   C->print_method(PHASE_AUTO_VECTORIZATION1_BEFORE_APPLY, 4, cl());
> 1895: 
> 1896:   apply_memops_reordering_with_schedule();

How about having a separate schedule class that creates the schedule and also offers this method? It could make the class `VTransformGraph` lighter.

src/hotspot/share/opto/superword.cpp line 1899:

> 1897:   C->print_method(PHASE_AUTO_VECTORIZATION2_AFTER_REORDER, 4, cl());
> 1898: 
> 1899:   adjust_pre_loop_limit_to_align_main_loop_vectors();

Same thought here: Could this be a separate class? It looks like adjusting the pre loop is not really something the graph should know about. We are rather using information about the graph (i.e. the graph could be passed to the schedule class).

src/hotspot/share/opto/superword.cpp line 1910:

> 1908: // ordered according to the _schedule. This means that all packed memops are consecutive
> 1909: // in the memory graph after the reordering.
> 1910: void VTransformGraph::apply_memops_reordering_with_schedule() const {

General thought here (was like that before and could also be done separately at some point): Steps (1) - (3) could probably be moved to separate methods.

src/hotspot/share/opto/superword.cpp line 2011:

> 2009:   // generated def (input) nodes when we are generating the use nodes in "apply".
> 2010:   int length = _vtnodes.length();
> 2011:   GrowableArray<Node*> vnode_idx_to_transformed_node(length, length, nullptr);

Also applies to parameter names of the `apply()` methods.
Suggestion:

  GrowableArray<Node*> vtnode_idx_to_transformed_node(length, length, nullptr);

src/hotspot/share/opto/superword.cpp line 3041:

> 3039:   VectorSet vtn_dependencies; // Shared, but cleared for every vtnode.
> 3040:   build_edges_for_vector_vtnodes(vtn_dependencies);
> 3041:   build_edges_for_scalar_vtnodes(vtn_dependencies);

This is a little confusing since you are not only adding edges but also new vtnodes. Maybe you find a better name. But I suggest to at least add some comments at these methods to mention that. Otherwise, it reads like: Build vector vtnodes, then scalar vtnodes, then connect them somehow (without new nodes being added).

src/hotspot/share/opto/superword.cpp line 3047:

> 3045:   for (int i = 0; i < _packset.length(); i++) {
> 3046:     Node_List* pack = _packset.at(i);
> 3047:     VTransformVectorNode* vtn = make_vtnode_for_pack(pack);

For consistency, you can name this function `make_vector_vtnode_for_pack()` since you are only calling it to create vector vtnodes.

src/hotspot/share/opto/superword.cpp line 3049:

> 3047:     VTransformVectorNode* vtn = make_vtnode_for_pack(pack);
> 3048:     for (uint k = 0; k < pack->size(); k++) {
> 3049:       set_vtnode(pack->at(k), vtn);

Maybe rename this to `map_node_to_vtnode()` to make it more explicit.

src/hotspot/share/opto/superword.cpp line 3143:

> 3141: 
> 3142:   if (p0->is_Load()) {
> 3143:     vtn = new (_graph.arena()) VTransformLoadVectorNode(_graph, pack_size);

You use `_graph.arena()` quite often. Maybe you want to create a new `_arena` field for easier access.

src/hotspot/share/opto/superword.cpp line 3290:

> 3288: }
> 3289: 
> 3290: void SuperWordVTransformBuilder::add_dependencies_of_node_to_vtn(Node*n, VTransformNode* vtn, VectorSet& vtn_dependencies) {

Suggestion:

void SuperWordVTransformBuilder::add_dependencies_of_node_to_vtnode(Node*n, VTransformNode* vtn, VectorSet& vtn_dependencies) {

src/hotspot/share/opto/superword.hpp line 644:

> 642: 
> 643: public:
> 644:   SuperWordVTransformBuilder(const PackSet& packset,

Maybe add a comment here that we pass in an empty graph and then start to build/initialize it.

src/hotspot/share/opto/superword.hpp line 651:

> 649:       _graph(graph)
> 650:   {
> 651:     build_vtransform();

Could also just be named `build()`

src/hotspot/share/opto/superword.hpp line 688:

> 686: 
> 687:   // Ensure that the main loop vectors are aligned by adjusting the pre loop limit.
> 688:   void determine_mem_ref_and_aw_for_main_loop_alignment();

This and the next line can be removed since they are defined in `VTransformGraph`.

src/hotspot/share/opto/vectorization.hpp line 1326:

> 1324: };
> 1325: 
> 1326: // VTransform

I'm wondering if all these class together with vtnodes should go to a separate vtransform file?

src/hotspot/share/opto/vectorization.hpp line 1356:

> 1354: 
> 1355: // Status output from a VTransformNode::apply
> 1356: class VTransformApplyStatus {

"Status" sounds like an error code or something like that. But it rather looks like a node transformation result. How about naming this something like `VTransformationResult`, `VNodeTransformationResult` or `VTNodeTransformation?

src/hotspot/share/opto/vectorization.hpp line 1387:

> 1385: };
> 1386: 
> 1387: class VTransformGraph : public StackObj {

I was looking for some summary/guidance here but only found it above on L1327 which is hard to find once this is in. How about putting the comments here instead?

src/hotspot/share/opto/vectorization.hpp line 1411:

> 1409:   bool _is_trace_align_vector;
> 1410:   bool _is_trace_info;
> 1411:   bool _is_trace_verbose;

Can these somehow be made `const`?

src/hotspot/share/opto/vectorization.hpp line 1457:

> 1455:   IdealLoopTree* lpt()        const { return _vloop.lpt(); }
> 1456:   CountedLoopNode* cl()       const { return _vloop.cl(); }
> 1457:   PhiNode* iv()               const { return _vloop.iv(); }

Unused and can be removed
Suggestion:

src/hotspot/share/opto/vectorization.hpp line 1461:

> 1459:   bool in_bb(const Node* n)   const { return _vloop.in_bb(n); }
> 1460: 
> 1461:   // VLoopVPointer accessors

Suggestion:

  // VLoopVPointers accessors

src/hotspot/share/opto/vectorization.hpp line 1541:

> 1539:   VTransformNode* out(int i) const { return _out.at(i); }
> 1540: 
> 1541:   bool has_req_or_dep() const {

To match `add_dependency`, I suggest to go with:
Suggestion:

  bool has_req_or_dependency() const {

src/hotspot/share/opto/vectorization.hpp line 1555:

> 1553:   virtual VTransformReductionVectorNode* isa_ReductionVector() { return nullptr; }
> 1554: 
> 1555:   virtual VTransformApplyStatus apply(const VLoopAnalyzer& vloop_analyzer,

"apply" is quite a generic name. We've discussed about this offline. What about naming it `transform()`? This would match the IGVN naming convention of transforming nodes. Moreover, you already have `find_transformed_input()` which would also support the name `transform()` over `apply()`.

src/hotspot/share/opto/vectorization.hpp line 1568:

> 1566: };
> 1567: 
> 1568: class VTransformScalarNode : public VTransformNode {

Not sure if you left out comments above the vtnode classes on purpose. But maybe a short summary about them could help readers of the code (if the class itself is not already self-explanatory).

src/hotspot/share/opto/vectorization.hpp line 1612:

> 1610: };
> 1611: 
> 1612: 

Suggestion:

src/hotspot/share/opto/vectorization.hpp line 1682:

> 1680:   const VTransformBoolTest _test;
> 1681: public:
> 1682:   VTransformBoolVectorNode(VTransformGraph& graph, int number_of_nodes, VTransformBoolTest test) :

`number_of_nodes` should be a `uint` to match the `VTransformVectorNode` constructor which takes a `uint`. Same for other classes.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19719#pullrequestreview-2150845545
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662217177
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662233604
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662243983
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662246300
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1661095071
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662321846
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662160834
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662158298
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662317821
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662313657
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662153682
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1661045136
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662262324
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662175682
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662180451
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662186060
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1660725816
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662227527
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1660729291
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662330704
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1660866423
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662335535
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1662333257
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1660752641