RFR: 8315361: C2: Create a superclass of SuperWord [v2]

Emanuel Peter epeter at openjdk.org
Fri Nov 3 16:15:06 UTC 2023


On Thu, 2 Nov 2023 09:07:24 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> @fg1417 thanks for the answer. I think I'd be ok with doing only the part you did for now. We can do the `velt` part separately. But I would prefer that we do not integrate the post-loop-vectorizer until we have found a way to unify the `velt` logic.
>> 
>> I think it would be nice to find a solution to what you did in https://github.com/openjdk/jdk/pull/7954. I wonder if we can do some of the work already during IGVN, and simplify the graph (best option if possible). If not, then we would need some sort of pre-processing for the AutoVectorizer, and either modify the graph (maybe not such a good idea) or somehow have a separate mapping of the graph/copy (more overhead / complexity).
>> 
>> How does the post-loop-vectorizer deal with all of this in the current draft?
>
> @eme64 , thanks for your reply.
> 
>> How does the post-loop-vectorizer deal with all of this in the current draft?
> 
> Actually, the post-loop-vectorizer hasn’t supported this kind of scenario either, out of the same reason. As we all know, it’s not difficult in both two vectorizer to determine the precise type for all candidate nodes, releasing the limitation [here](https://github.com/openjdk/jdk/blob/5207443b360cfe3ee9c53ece55da3464c13f6a9f/src/hotspot/share/opto/superword.cpp#L3400) and [here](https://github.com/openjdk/jdk/blob/bd1b939b21a23675fd072b91d4eab538ff6d2a7d/src/hotspot/share/opto/vmaskloop.cpp#L357). But the issue is how to guarantee the correctness of vector shift left then right operations for subword types.
> 
> Post-loop-vectorizer collects all nodes for each statement. With the mapping info from statements to nodes, it would be easier **in the phase of vectorizing** to recognize and then remove the pattern LShiftI – RShiftI, which is generated by scalar width promotion,  from the graph to guarantee the equivalent transformation from scalar to vector.
> 
> But, for SuperWord, all nodes in packsets are less organized. As you said, we may need do the removing work in GVN **after vectorization** (maybe less maintainable) or collect related info mapping all nodes to different statements **before loop unrolling**, just like post-loop-vectorizer does (seems costly). Maybe it depends on if C2 in the future will have any separate analysis phase for Auto-Vectorization and where it locates.
> 
> Thanks.

Hi @fg1417 I will think a bit more about the "useless sign extension" myself (https://github.com/openjdk/jdk/pull/7954). It would probably be nice to have a large collection of shapes. Maybe there is a way to "ignore" these ops and mark them as such. We would then "ignore" them and look through when computing the "velt", and when extending the packset. And we would have to somehow "sanitize" the graph later and remove the packs that are to be "ignored", and just rewire to their inputs.
Or we simply hack away those nodes in some pre-vectorization phase, or while determining the "velt" type. But that all sounds a bit hacky.

---

 
I think we may have to chose **Composition** over the **SuperClass** idea, like @chhagedorn had suggested.

We could make this new utility class a **LoopAnalyzer**, which can then be passed to different **Vectorizers**.
We can then run the LoopAnalyzer only once, and determine `velt`, `dependencies`, `loop membership` etc.
And then we can call multiple Vectorizers (maybe different strategies, or SLP vs widening directly), and each one of those can propose a "vectorization transformation plan". We then determine which one to chose with a cost model.
Having a SuperClass would mean that every Vectorizer needs to rerun the basic analysis, with runtime and memory overhead.

I know this would require a bigger refactoring, where we have to re-route many calls that now to to SuperWord directly to this new LoopAnalyzer. We can also do that in multiple steps, you do one part and I the other, for example.

Here one of my blog-posts that give more ideas about having multiple Vectorizers:
https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16353#issuecomment-1792727439


More information about the hotspot-compiler-dev mailing list