RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph [v15]

Christian Hagedorn chagedorn at openjdk.org
Thu Jul 4 10:58:30 UTC 2024


On Thu, 4 Jul 2024 10:35:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>> 
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>> 
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>> 
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>> 
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>> 
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>> 
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>> 
>> **Testing**
>> 
>> Regression testing passed.
>> 
>> Performance testing: no significant change in performance (as expected).
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8332163-VTransformGraph-v2
>  - More updates for Christian
>  - Apply suggestions from code review
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Apply suggestions from code review
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - rm newline
>  - tabs and includes
>  - vtransform comments
>  - move some code over to graph
>  - VTransformTrace
>  - split graph out from VTransform, wip
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/820f08f2...18f95141

That looks good! Nice refactoring :-)

src/hotspot/share/opto/superword.cpp line 2820:

> 2818: #ifdef ASSERT
> 2819:   if (_trace._align_vector) {
> 2820:     tty->print_cr("\nVLoopTransform::adjust_pre_loop_limit_to_align_main_loop_vectors:");

For some reason I suggested `VLoopTransform` before.
Suggestion:

    tty->print_cr("\nVTransform::adjust_pre_loop_limit_to_align_main_loop_vectors:");

src/hotspot/share/opto/vtransform.cpp line 144:

> 142:     _node->dump();
> 143:   }
> 144: }

Should match the now changed method declaration in the header file:
Suggestion:

void VTransformApplyResult::trace(VTransformNode* vtnode) const {
  tty->print("  apply: ");
  vtnode->print();
  tty->print("    ->   ");
  if (_node == nullptr) {
    tty->print_cr("nullptr");
  } else {
    _node->dump();
  }
}

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19719#pullrequestreview-2158625188
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665525863
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665533971


More information about the hotspot-compiler-dev mailing list