RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph [v11]

Christian Hagedorn chagedorn at openjdk.org
Thu Jul 4 09:35:35 UTC 2024


On Wed, 3 Jul 2024 14:21:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>> 
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>> 
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>> 
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>> 
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>> 
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>> 
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>> 
>> **Testing**
>> 
>> Regression testing passed.
>> 
>> Performance testing: no significant change in performance (as expected).
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   rm newline

Looks already very good now, thanks for the updates and the offline discussion! I like having the new files which better separates the code.

I only have a few other mostly minor comments.

src/hotspot/share/opto/superword.cpp line 1866:

> 1864:                         _mem_ref_for_main_loop_alignment,
> 1865:                         _aw_for_main_loop_alignment
> 1866:                         NOT_PRODUCT( COMMA trace)

Suggestion:

                        NOT_PRODUCT(COMMA trace)

src/hotspot/share/opto/superword.cpp line 2652:

> 2650:   DEBUG_ONLY(                           \
> 2651:     if (_trace._align_vector) {       \
> 2652:       tty->print("  " #node ": ");      \

Suggestion:

    if (_trace._align_vector) {         \
      tty->print("  " #node ": ");      \

src/hotspot/share/opto/superword.cpp line 2820:

> 2818: #ifdef ASSERT
> 2819:   if (_trace._align_vector) {
> 2820:     tty->print_cr("\nVLoopTransformGraph::adjust_pre_loop_limit_to_align_main_loop_vectors:");

Suggestion:

    tty->print_cr("\nVLoopTransform::adjust_pre_loop_limit_to_align_main_loop_vectors:");

src/hotspot/share/opto/superword.cpp line 3036:

> 3034: bool SuperWord::same_generation(Node* a, Node* b) const {
> 3035:   return a != nullptr && b != nullptr && _clone_map.same_gen(a->_idx, b->_idx);
> 3036: }

Just noticed this by coincidence: This method is dead and can be removed. Can probably also be done in the scope of this PR instead of doing an extra RFE just for that.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 180:

> 178: 
> 179: // Either get existing vtnode vector input (when input is a pack), or else make a
> 180: // new one vector vtnode for the input (e.g. for Replicate or PopulateIndex).

Suggestion:

// Either get the existing vtnode vector input (when input is a pack), or else make a
// new vector vtnode for the input (e.g. for Replicate or PopulateIndex).

src/hotspot/share/opto/superwordVTransformBuilder.hpp line 48:

> 46:       _vtransform(vtransform)
> 47:   {
> 48:     assert(_vtransform.is_empty(), "constructor is passed an empty vtransform");

How about naming this `has_graph()`?

src/hotspot/share/opto/vtransform.cpp line 79:

> 77:         if (post_visited.test(use->_idx)) { continue; }
> 78:         if (pre_visited.test(use->_idx)) {
> 79:           // Circle detected!

The Circle! Cycle?
Suggestion:

          // Cycle detected!

src/hotspot/share/opto/vtransform.cpp line 84:

> 82:           // and discover that use is also pre_visited but not post_visited. Thus, use
> 83:           // lies on that path from "root" to vtn, and the edge (vtn, use) closes a
> 84:           // circle.

Suggestion:

          // cycle.

src/hotspot/share/opto/vtransform.cpp line 97:

> 95:         _schedule.at_put_grow(rpo_idx--, vtn); // assign rpo_idx
> 96:       }
> 97: 

Suggestion:

src/hotspot/share/opto/vtransform.cpp line 253:

> 251: }
> 252: 
> 253: VTransformApplyResult VTransformBoolVectorNode::apply(const VLoopAnalyzer& vloop_analyzer, const GrowableArray<Node*>& vnode_idx_to_transformed_node) const {

Maybe wrap this long line:
Suggestion:

VTransformApplyResult VTransformBoolVectorNode::apply(const VLoopAnalyzer& vloop_analyzer, 
                                                      const GrowableArray<Node*>& vnode_idx_to_transformed_node) const {

Same at other apply methods.

src/hotspot/share/opto/vtransform.hpp line 33:

> 31: // - Models the transformation of the scalar loop to vectorized loop:
> 32: //   It is a "C2 subgraph" -> "C2 subgraph" mapping.
> 33: // - The VTransform contains a graph (VTransformGraph), which consists

Suggestion:

// - The VTransform contains a graph (VTransformGraph), which consists of

src/hotspot/share/opto/vtransform.hpp line 46:

> 44: //
> 45: // - Schedule:
> 46: //   - Compute linearization of the graph, into an order that respects

To make it explicit:
Suggestion:

//   - Compute linearization of the VTransformGraph, into an order that respects

src/hotspot/share/opto/vtransform.hpp line 99:

> 97:   uint vector_length() const { return _vector_length; }
> 98:   uint vector_width() const { return _vector_width; }
> 99:   NOT_PRODUCT( void trace(VTransformNode* vtn) const; )

I suggest to be explicit for parameters:
Suggestion:

  NOT_PRODUCT( void trace(VTransformNode* vtnode) const; )

src/hotspot/share/opto/vtransform.hpp line 103:

> 101: 
> 102: #ifndef PRODUCT
> 103: // Convenience method for tracing flags.

Suggestion:

// Convenience class for tracing flags.

src/hotspot/share/opto/vtransform.hpp line 233:

> 231:   CountedLoopNode* cl()       const { return _vloop.cl(); }
> 232:   int iv_stride()             const { return cl()->stride_con(); }
> 233:   bool in_bb(const Node* n)   const { return _vloop.in_bb(n); }

Seems to be dead now and can be removed
Suggestion:

src/hotspot/share/opto/vtransform.hpp line 266:

> 264:     _req(req),
> 265:     _in(vtransform.arena(),  req, req, nullptr),
> 266:     _out(vtransform.arena(), 4, 0, nullptr)

Just noticed this here and totally unrelated, but there are a lot of places where we create a `GrowableArray` without a filler. So, we just pass `nullptr` or 0 (latter should probably be replaced by `nullptr`). Maybe `GrowableArray` should provide a constructor without a filler. Might be worth to do in an RFE at some point.

src/hotspot/share/opto/vtransform.hpp line 336:

> 334:     VTransformNode(vtransform, n->req()), _node(n) {}
> 335:   Node* node() const { return _node; }
> 336:   virtual VTransformScalarNode* isa_Scalar() override { return this; }

General comment about vtnode classes: The IDE suggests here and at other places that:

Clang-Tidy: 'virtual' is redundant since the function is already declared 'override'

So, you could get rid of `virtual` wherever you have `override`.

src/hotspot/share/opto/vtransform.hpp line 345:

> 343: // Wrapper node for nodes outside the loop that are inputs to nodes in the loop.
> 344: // Since we want the loop-internal nodes to be able to reference all inputs as vtnodes,
> 345: // we must wrap the inputs that are outside the loop also into special vtnodes.

Suggestion:

// we must wrap the inputs that are outside the loop into special vtnodes, too.

src/hotspot/share/opto/vtransform.hpp line 377:

> 375: };
> 376: 
> 377: // Transform introduces a shift-count node, that truncates the shift count for a vector shift.

Suggestion:

// Transform introduces a shift-count node that truncates the shift count for a vector shift.

src/hotspot/share/opto/vtransform.hpp line 421:

> 419:   }
> 420: 
> 421:   const GrowableArray<Node*> nodes() const { return _nodes; }

Shouldn't this return a const reference?

-------------

PR Review: https://git.openjdk.org/jdk/pull/19719#pullrequestreview-2156605981
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665249564
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665399616
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665400589
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665402864
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665338136
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665321530
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665419275
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665420256
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665421545
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665424645
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664284232
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665255316
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665263052
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665263650
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665326827
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665331427
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665359824
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665340856
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665361650
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665376985


More information about the hotspot-compiler-dev mailing list