RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism

Tue Jan 30 12:18:51 UTC 2024

On Tue, 30 Jan 2024 06:14:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Subtask of https://github.com/openjdk/jdk/pull/16620
> 
> 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static.
> 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop.
> 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization.
> 
> Side-note about https://github.com/openjdk/jdk/pull/17604:
> I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`.

src/hotspot/share/opto/loopTransform.cpp line 1106:

> 1104:       VLoop vloop(this, true);
> 1105:       if (vloop.check_preconditions()) {
> 1106:         SuperWord::unrolling_analysis(vloop, _local_loop_unroll_factor);

Note: I made `unrolling_analysis` static, and only pass in `vloop`, not all the info in the `SuperWord` object.

src/hotspot/share/opto/loopnode.cpp line 4867:

> 4865:   // Auto-vectorize main-loop
> 4866:   if (C->do_superword() && C->has_loops() && !C->major_progress()) {
> 4867:     ResourceArea autovectorization_arena;

Note: this allows us to free up all the space used by `SuperWord`'s internal data structures between every processed loop. Previously, all internal data structures were on the `phase->C->comp_arena()`.

src/hotspot/share/opto/loopnode.cpp line 5988:

> 5986:     _pre_loop_end = pre_loop_end;
> 5987:   }
> 5988: 

Note: This should have never been cached in the node itself, but only during autovectorization.

I moved it now into `VLoop`, which I also pass into `VPointer`, which has to access the pre-loop for independence checks.

src/hotspot/share/opto/loopnode.hpp line 235:

> 233: 
> 234:   // Cached CountedLoopEndNode of pre loop for main loops
> 235:   CountedLoopEndNode* _pre_loop_end;

Note: this makes the node smaller, and does not cache something that may be invalid later. It was used only during SuperWord. Looks like a bad pattern.

src/hotspot/share/opto/loopopts.cpp line 4231:

> 4229:   }
> 4230: 
> 4231:   // This counted main-loop either failed preconditions, the analyzer

Suggestion:

  // This counted main-loop either failed preconditions,

src/hotspot/share/opto/superword.cpp line 59:

> 57:   _do_vector_loop(phase()->C->do_vector_loop()),            // whether to do vectorization/simd style
> 58:   _num_work_vecs(0),                                        // amount of vector work we have
> 59:   _num_reductions(0)                                        // amount of reduction work we have

Note:
Before this change, we used to only create SuperWord once, and use it for all loops in the compilation. Now that I ripped out the "init" method, and avoid reusing SuperWord this way, we want to make sure we do not re-allocate too much.

For some data structures I now pre-allocate memory for the maximum size they may ever reach. This is to avoid re-allocation when they grow.

src/hotspot/share/opto/superword.cpp line 67:

> 65:   CountedLoopNode* cl   = vloop.cl();
> 66:   Node* cl_exit         = vloop.cl_exit();
> 67:   PhaseIdealLoop* phase = vloop.phase();

Note: Made it static, and instead of the SuperWord object, we now only have access to the VLoop object.

src/hotspot/share/opto/superword.cpp line 75:

> 73: 
> 74: //------------------------------transform_loop---------------------------
> 75: bool SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) {

Note: code moved to `VLoop::check_preconditions_helper`

src/hotspot/share/opto/superword.cpp line 98:

> 96:   if (SuperWordReductions) {
> 97:     mark_reductions();
> 98:   }

Note: I now would like to move reduction marking to **after** precondition checking. Hence, I moved it to `SLP_extract`.

src/hotspot/share/opto/superword.cpp line 149:

> 147:   }
> 148: 
> 149:   init(); // initialize data structures

Note: this is the end of the "preconditions", and we used to set `_early_exit = false` inside `init()`

src/hotspot/share/opto/superword.cpp line 180:

> 178:     Node* n = lpt()->_body.at(i);
> 179:     if (n == cl->incr() ||
> 180:       is_marked_reduction(n) ||

This is the annoying reduction, which would require reduction marking for `unrolling_analysis`.
I'm hoping to remove it here: https://github.com/openjdk/jdk/pull/17604

src/hotspot/share/opto/superword.cpp line 414:

> 412: }
> 413: 
> 414: const char* SuperWord::transform_loop_helper() {

TODO remove the helper, we can do that later

src/hotspot/share/opto/superword.cpp line 469:

> 467:   if (SuperWordReductions) {
> 468:     mark_reductions();
> 469:   }

Note: it would be really nice to move reduction marking to all other analysis, like bb construction and velt computation, etc.

src/hotspot/share/opto/superword.cpp line 2323:

> 2321: // modified. We bail out, and retry without SuperWord.
> 2322: bool SuperWord::output() {
> 2323:   assert(!_packset.is_empty(), "packset must not be empty");

revert this!

src/hotspot/share/opto/superword.cpp line 3786:

> 3784:   _align_to_ref = nullptr;
> 3785:   _race_possible = 0;
> 3786:   _early_return = false;

remove init!

src/hotspot/share/opto/superword.hpp line 325:

> 323:   CountedLoopNode* _lp;            // Current CountedLoopNode
> 324:   VectorSet      _loop_reductions; // Reduction nodes in the current loop
> 325:   Node*          _bb;              // Current basic block

Note: always the same as `cl`

src/hotspot/share/opto/vectorization.cpp line 57:

> 55: }
> 56: 
> 57: const char* VLoop::check_preconditions_helper() {

Note: replaces most code from the old `SuperWord::transform_loop`

src/hotspot/share/opto/vectorization.hpp line 101:

> 99:     assert(head != nullptr, "must find head");
> 100:     return head;
> 101:   };

Note: before this patch, these two cache-accessors were in the `CounterLoopEndNode`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471075782
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471079385
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470844038
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470848311
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471081941
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470881385
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470949551
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795774
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470798480
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470796778
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644463
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470805244
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470968130
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644015
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470643573
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471010024
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795387
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471072996