RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer

Wed Dec 6 11:01:40 UTC 2023

On Wed, 6 Dec 2023 08:35:51 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> This is a refactoring of `SuperWord`.
>> I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785).
>> 
>> **Goals**
>> 
>> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**.
>> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)).
>> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)).
>> 4. Improve tracing in the auto-vectorization by making it more systematic.
>> 
>> **Summary**
>> 
>> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!):
>> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177
>> - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are:
>>   - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow).
>>     - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`.
>>   - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`.
>>   - Finding and marking reductions -> `VLoopReductions`
>>   - Detecting memory slices -> `VLoopMemorySlices`
>>   - Analyzing the body -> `VLoopBody`  (renamed `in_bb` -> `in_body`)
>>   - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes`
>>   - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components.
>> - New: CompileCommand option `TraceAutovectorization`
>>   - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description.
>>   - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`.
>>   - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods.
>>   - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically.
>>   - ...
>
> src/hotspot/share/opto/superword.hpp line 282:
> 
>> 280:   bool is_trace_superword_adjacent_memops() const {
>> 281:     return vla().is_trace_superword_adjacent_memops();
>> 282:   }
> 
> How about redefining it as:
> 
>   bool is_trace_superword_adjacent_memops() const {
>     return TraceSuperWord || vla().is_trace_tag_active(TraceAutovectorizationTag::TAG_SW_ADJACENT_MEMOPS);
>   }
> 
> 
> And add a consulting interface in `class VLoop`:
> 
>   bool is_trace_tag_active(TraceAutovectorizationTag tag) const {
>     return _trace_tags.at(tag);
>   }
> 
> 
> Thus, we don't have to involve any `SuperWord` specific words or options in shared facility.

@fg1417 ok, I can do that :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16620#discussion_r1417103516