RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism [v4]
Vladimir Kozlov
kvn at openjdk.org
Fri Feb 2 18:36:06 UTC 2024
On Fri, 2 Feb 2024 10:19:36 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Subtask of https://github.com/openjdk/jdk/pull/16620
>> (The basic goal is to break SuperWord into different modules. This makes the code more maintainable and extensible. And eventually this allows some modules to be reused by other/new vectorizers.)
>>
>> 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static.
>> 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop.
>> 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization.
>>
>> Side-note about https://github.com/openjdk/jdk/pull/17604 (integrated, no need to read any more):
>> I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> timing code from JDK-8325159
Thank you for running the timing testing. What about memory fragmentation? Is this code will uses default chunks in Arena (they can be reused) or allocates new chunk (malloc) each time which may lead to fragmentation.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17624#issuecomment-1924463097
More information about the hotspot-compiler-dev
mailing list