RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism
Emanuel Peter
epeter at openjdk.org
Tue Jan 30 12:18:51 UTC 2024
On Tue, 30 Jan 2024 06:14:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> Subtask of https://github.com/openjdk/jdk/pull/16620
>
> 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static.
> 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop.
> 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization.
>
> Side-note about https://github.com/openjdk/jdk/pull/17604:
> I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`.
src/hotspot/share/opto/loopTransform.cpp line 1106:
> 1104: VLoop vloop(this, true);
> 1105: if (vloop.check_preconditions()) {
> 1106: SuperWord::unrolling_analysis(vloop, _local_loop_unroll_factor);
Note: I made `unrolling_analysis` static, and only pass in `vloop`, not all the info in the `SuperWord` object.
src/hotspot/share/opto/loopnode.cpp line 4867:
> 4865: // Auto-vectorize main-loop
> 4866: if (C->do_superword() && C->has_loops() && !C->major_progress()) {
> 4867: ResourceArea autovectorization_arena;
Note: this allows us to free up all the space used by `SuperWord`'s internal data structures between every processed loop. Previously, all internal data structures were on the `phase->C->comp_arena()`.
src/hotspot/share/opto/loopnode.cpp line 5988:
> 5986: _pre_loop_end = pre_loop_end;
> 5987: }
> 5988:
Note: This should have never been cached in the node itself, but only during autovectorization.
I moved it now into `VLoop`, which I also pass into `VPointer`, which has to access the pre-loop for independence checks.
src/hotspot/share/opto/loopnode.hpp line 235:
> 233:
> 234: // Cached CountedLoopEndNode of pre loop for main loops
> 235: CountedLoopEndNode* _pre_loop_end;
Note: this makes the node smaller, and does not cache something that may be invalid later. It was used only during SuperWord. Looks like a bad pattern.
src/hotspot/share/opto/loopopts.cpp line 4231:
> 4229: }
> 4230:
> 4231: // This counted main-loop either failed preconditions, the analyzer
Suggestion:
// This counted main-loop either failed preconditions,
src/hotspot/share/opto/superword.cpp line 59:
> 57: _do_vector_loop(phase()->C->do_vector_loop()), // whether to do vectorization/simd style
> 58: _num_work_vecs(0), // amount of vector work we have
> 59: _num_reductions(0) // amount of reduction work we have
Note:
Before this change, we used to only create SuperWord once, and use it for all loops in the compilation. Now that I ripped out the "init" method, and avoid reusing SuperWord this way, we want to make sure we do not re-allocate too much.
For some data structures I now pre-allocate memory for the maximum size they may ever reach. This is to avoid re-allocation when they grow.
src/hotspot/share/opto/superword.cpp line 67:
> 65: CountedLoopNode* cl = vloop.cl();
> 66: Node* cl_exit = vloop.cl_exit();
> 67: PhaseIdealLoop* phase = vloop.phase();
Note: Made it static, and instead of the SuperWord object, we now only have access to the VLoop object.
src/hotspot/share/opto/superword.cpp line 75:
> 73:
> 74: //------------------------------transform_loop---------------------------
> 75: bool SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) {
Note: code moved to `VLoop::check_preconditions_helper`
src/hotspot/share/opto/superword.cpp line 98:
> 96: if (SuperWordReductions) {
> 97: mark_reductions();
> 98: }
Note: I now would like to move reduction marking to **after** precondition checking. Hence, I moved it to `SLP_extract`.
src/hotspot/share/opto/superword.cpp line 149:
> 147: }
> 148:
> 149: init(); // initialize data structures
Note: this is the end of the "preconditions", and we used to set `_early_exit = false` inside `init()`
src/hotspot/share/opto/superword.cpp line 180:
> 178: Node* n = lpt()->_body.at(i);
> 179: if (n == cl->incr() ||
> 180: is_marked_reduction(n) ||
This is the annoying reduction, which would require reduction marking for `unrolling_analysis`.
I'm hoping to remove it here: https://github.com/openjdk/jdk/pull/17604
src/hotspot/share/opto/superword.cpp line 414:
> 412: }
> 413:
> 414: const char* SuperWord::transform_loop_helper() {
TODO remove the helper, we can do that later
src/hotspot/share/opto/superword.cpp line 469:
> 467: if (SuperWordReductions) {
> 468: mark_reductions();
> 469: }
Note: it would be really nice to move reduction marking to all other analysis, like bb construction and velt computation, etc.
src/hotspot/share/opto/superword.cpp line 2323:
> 2321: // modified. We bail out, and retry without SuperWord.
> 2322: bool SuperWord::output() {
> 2323: assert(!_packset.is_empty(), "packset must not be empty");
revert this!
src/hotspot/share/opto/superword.cpp line 3786:
> 3784: _align_to_ref = nullptr;
> 3785: _race_possible = 0;
> 3786: _early_return = false;
remove init!
src/hotspot/share/opto/superword.hpp line 325:
> 323: CountedLoopNode* _lp; // Current CountedLoopNode
> 324: VectorSet _loop_reductions; // Reduction nodes in the current loop
> 325: Node* _bb; // Current basic block
Note: always the same as `cl`
src/hotspot/share/opto/vectorization.cpp line 57:
> 55: }
> 56:
> 57: const char* VLoop::check_preconditions_helper() {
Note: replaces most code from the old `SuperWord::transform_loop`
src/hotspot/share/opto/vectorization.hpp line 101:
> 99: assert(head != nullptr, "must find head");
> 100: return head;
> 101: };
Note: before this patch, these two cache-accessors were in the `CounterLoopEndNode`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471075782
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471079385
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470844038
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470848311
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471081941
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470881385
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470949551
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795774
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470798480
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470796778
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644463
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470805244
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470968130
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644015
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470643573
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471010024
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795387
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471072996
More information about the hotspot-compiler-dev
mailing list