RFR: 8298935: fix cyclic dependency bug in create_pack logic in SuperWord::find_adjacent_refs

Thu Feb 9 10:37:45 UTC 2023

On Fri, 3 Feb 2023 10:21:59 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> Sorry to clarify:
>> 
>>  _if it holds `OFFSET * element_size_in_bytes % MaxVectorSize == 0`, vector load and vector store won't access overlapped memory within one vector execution._, which means vector load and vector store won't access **partially overlapped** memory within one vector execution. They're still allowed to access **completely overlapped** memory with one vector execution, namely `b[i] = a[i]`.
>
>> Generally, I am wondering about this though: Why do we force the loads / stores of the same type to be `completely overlapped` (like @fg1417 calles it), so have `memory_alignment(p1, p2) == 0` for all `p1`, `p2` of the same type? This seems to be more constrained than necessary. Why do we not just rely on packs being internally independent, ie `independent(s1, s2)` for all `s1, s2` in the same pack?
> 
> Yes, that's really a good idea to help vectorize more scenarios about reading forward. My concern is that we need to call `independent(s1, s2)` for nodes in the same pack, and thus the calling times would increase rapidly as `MaxVectorSize` increases. For example, we have 64 nodes for one byte pack when `MaxVectorSize=64`. For the current algorithm, `memory_alignment()`, the complexity is low. Besides, currently, we partially support reading forward. For the case like
> 
> // int[] a, b;
> for (int i = start; i < limit; i++) {
>   b[i] = a[i+OFFSET];
> }
> 
> once it holds `OFFSET * element_size_in_bytes % MaxVectorSize == 0`, which covers `completely overlapped`, the loop can be vectorized successfully. 
> 
> Thanks.

@fg1417 Do you have the possibility to test on arm32?

-------------

PR: https://git.openjdk.org/jdk/pull/12350