RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism [v8]
Vladimir Kozlov
kvn at openjdk.org
Thu Feb 8 18:36:06 UTC 2024
On Tue, 6 Feb 2024 09:15:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Subtask of https://github.com/openjdk/jdk/pull/16620
>> (The basic goal is to break SuperWord into different modules. This makes the code more maintainable and extensible. And eventually this allows some modules to be reused by other/new vectorizers.)
>>
>> 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static.
>> 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop.
>> 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization.
>>
>> Side-note about https://github.com/openjdk/jdk/pull/17604 (integrated, no need to read any more):
>> I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`.
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits:
>
> - manual merge
> - Merge branch 'master' into JDK-8324890
> - add VSharedData class
> - manual merge
> - timing code from JDK-8325159
> - handle AutoVectorizeStatus::TriedAndFailed outside autovectorize
> - Merge branch 'master' into JDK-8324890
> - _vtrace is moved to VLoop
> - comment update
> - cosmetics
> - ... and 8 more: https://git.openjdk.org/jdk/compare/b02599d2...dcade5f3
Few comments
src/hotspot/share/opto/loopnode.cpp line 4872:
> 4870: // Shared data structures for all AutoVectorizations, to reduce allocations
> 4871: // of large arrays.
> 4872: VSharedData vshared;
So it is local for each `build_and_optimize()` call and space will be freed by destructor.
src/hotspot/share/opto/vectorization.hpp line 156:
> 154: GrowableArray<int>& node_idx_to_loop_body_idx() {
> 155: // Since this is a shared resource, we clear before every individual use.
> 156: _node_idx_to_loop_body_idx.clear();
I think it should be explicit `VSharedData::clear()` method called in `auto_vectorize()`. Otherwise much later someone will have hard time to find place where the space is cleared.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17624#pullrequestreview-1870948784
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1483408513
PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1483420419
More information about the hotspot-compiler-dev
mailing list