RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph [v10]

Emanuel Peter epeter at openjdk.org
Wed Jul 3 14:10:29 UTC 2024


On Wed, 3 Jul 2024 13:54:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>> 
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>> 
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>> 
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>> 
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>> 
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>> 
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>> 
>> **Testing**
>> 
>> Regression testing passed.
>> 
>> Performance testing: no significant change in performance (as expected).
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   tabs and includes

src/hotspot/share/opto/superword.cpp line 40:

> 38: #include "opto/vectornode.hpp"
> 39: #include "opto/movenode.hpp"
> 40: #include "utilities/powerOfTwo.hpp"

Note: I was able to remove some includes. Some were probably previously unnecessary, but some also are unnecessary since I removed/moded logic to other files now.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 28:

> 26: #include "opto/vectornode.hpp"
> 27: 
> 28: void SuperWordVTransformBuilder::build() {

Note: used to be `PacksetGraph::build`

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 181:

> 179: // Either get existing vtnode vector input (when input is a pack), or else make a
> 180: // new one vector vtnode for the input (e.g. for Replicate or PopulateIndex).
> 181: VTransformNode* SuperWordVTransformBuilder::get_or_make_vtnode_vector_input_at_index(const Node_List* pack, const int index) {

Note: used to be `SuperWord::vector_opd`

src/hotspot/share/opto/superwordVTransformBuilder.hpp line 30:

> 28: #define SHARE_OPTO_SUPERWORD_VTRANSFORM_BUILDER_HPP
> 29: 
> 30: // Facility class that builds a VTransform from a SuperWord PackSet.

Note: this takes over a part of the old `PacksetGraph` and `SuperWord::output`.

src/hotspot/share/opto/vectorization.hpp line 1332:

> 1330:   VTransformBoolTest(const BoolTest::mask mask, bool is_negated) :
> 1331:     _mask(mask), _is_negated(is_negated) {}
> 1332: };

Note: moved to `vtransform.hpp`

src/hotspot/share/opto/vtransform.cpp line 46:

> 44: //
> 45: // We return "true" IFF we find no cycle, i.e. if the linearization succeeds.
> 46: bool VTransformGraph::schedule() {

Note: used to be `PacksetGraph::schedule`. I use a reverse-post-order algorithm now, used to be topsort.

src/hotspot/share/opto/vtransform.hpp line 59:

> 57: // - Pack/Unpack/Shuffle: introduce additional nodes not present in the scalar loop.
> 58: //                        This is difficult to do with the SuperWord packset approach.
> 59: // - If-conversion: convert predicated nodes into CFG.

Note: please read this description, it explains the basic idea of the `VTransform`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664232667
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664243632
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664247162
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664248990
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664250172
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664252012
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664254914


More information about the hotspot-compiler-dev mailing list