RFR: 8315361: C2: Create a superclass of SuperWord [v2]

Fei Gao fgao at openjdk.org
Thu Nov 2 09:11:01 UTC 2023


On Wed, 1 Nov 2023 07:01:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> My question: should we not also move the `velt` (vector element type) and maybe even the dependence graph parts? If I remember right, then the post-loop-vectorizer also needs some way to determine element type (`_elem_bt` [here](https://github.com/openjdk/jdk/pull/14581)), but not sure about dependence graph. And also the memory slices may be helpful for all vectorizers. We could also have more classes that have different parts of these features (eg. `Vectorizer` with only basic functionality, and then a `VectorizerWithDependencyGraph`).
>> 
>> Hi @eme64, thanks for your review!
>> 
>> Yes, and both SuperWord and post-loop-vectorizer need to determine vector element type. There are still some deficiencies in current [`compute_vector_element_type()`](https://github.com/openjdk/jdk/blob/b3fec6b5f32c338ae1a84dd20bdcbd3d9b7186f3/src/hotspot/share/opto/superword.cpp#L3350) of SuperWord. For example, some loop cases of subword types, including byte and short, can't be recognized very well. See the description in https://github.com/openjdk/jdk/pull/7954. Maybe the main idea of [`find_vector_element_types()`](https://github.com/openjdk/jdk/blob/bd1b939b21a23675fd072b91d4eab538ff6d2a7d/src/hotspot/share/opto/vmaskloop.cpp#L318) proposed in post-loop-vectorizer can be improved to solve it?
>> 
>> About the dependence graph part, SuperWord works better than post-loop-vectorizer, because it can support more cases like reading-forward, after your great refactoring 👍 . But now, post-loop-vectorizer can’t support any cases with index offset.
>> 
>> So, maybe both two parts need more complex refactoring and improving, before they can be shared well by two vectorizers, not like code parts in this patch.
>> 
>> Thanks!
>
> @fg1417 thanks for the answer. I think I'd be ok with doing only the part you did for now. We can do the `velt` part separately. But I would prefer that we do not integrate the post-loop-vectorizer until we have found a way to unify the `velt` logic.
> 
> I think it would be nice to find a solution to what you did in https://github.com/openjdk/jdk/pull/7954. I wonder if we can do some of the work already during IGVN, and simplify the graph (best option if possible). If not, then we would need some sort of pre-processing for the AutoVectorizer, and either modify the graph (maybe not such a good idea) or somehow have a separate mapping of the graph/copy (more overhead / complexity).
> 
> How does the post-loop-vectorizer deal with all of this in the current draft?

@eme64 , thanks for your reply.

> How does the post-loop-vectorizer deal with all of this in the current draft?

Actually, the post-loop-vectorizer hasn’t supported this kind of scenario either, out of the same reason. As we all know, it’s not difficult in both two vectorizer to determine the precise type for all candidate nodes, releasing the limitation [here](https://github.com/openjdk/jdk/blob/5207443b360cfe3ee9c53ece55da3464c13f6a9f/src/hotspot/share/opto/superword.cpp#L3400) and [here](https://github.com/openjdk/jdk/blob/bd1b939b21a23675fd072b91d4eab538ff6d2a7d/src/hotspot/share/opto/vmaskloop.cpp#L357). But the issue is how to guarantee the correctness of vector shift left then right operations for subword types.

Post-loop-vectorizer collects all nodes for each statement. With the mapping info from statements to nodes, it would be easier **in the phase of vectorizing** to recognize and then remove the pattern LShiftI – RShiftI, which is generated by scalar width promotion,  from the graph to guarantee the equivalent transformation from scalar to vector.

But, for SuperWord, all nodes in packsets are less organized. As you said, we may need do the removing work in GVN **after vectorization** (maybe less maintainable) or collect related info mapping all nodes to different statements **before loop unrolling**, just like post-loop-vectorizer does (seems costly). Maybe it depends on if C2 in the future will have any separate analysis phase for Auto-Vectorization and where it locates.

Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16353#issuecomment-1790332169


More information about the hotspot-compiler-dev mailing list