RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v43]

Fri Dec 15 10:50:24 UTC 2023

On Fri, 15 Dec 2023 08:28:51 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   renamings and proof improvement in adjust_pre_loop_limit_to_align_main_loop_vectors
>
> src/hotspot/share/opto/vectorization.cpp line 824:
> 
>> 822:   //   (C_const             + C_pre * pre_iter_C_const) % aw = 0                 (4c)
>> 823:   //
>> 824:   // We can only guarantee solutions to (4a) and (4b) if:
> 
> Should we also mention here that strengthening it with (4a-c) is the best we can do to prove (3) statically while we might miss some solutions for (3) where some of (4a-c) are false which, however, cannot be proven statically?

Maybe we can also put (5a-b) in words:
Suggestion:

  // We can only guarantee solutions to (4a) and (4b) if C_init and C_invar are zero or multiples of C_pre:

> src/hotspot/share/opto/vectorization.cpp line 940:
> 
>> 938:   //     hence we have to add a dependency for invar, and scale (pre_stride is the
>> 939:   //     same for all mem_refs in the loop). If there is no invariant, then we add
>> 940:   //     a dependency that there is no invariant.
> 
> I found it a little difficult to understand this part. I had to write it down to verify that we indeed are not missing a dependency. Maybe we can be more explicit here. How about something like that?
> 
> 
>  // Hence, the solution for pre_iter depends on:
>   //   - Always: pre_r and pre_q
>   //   - If an init is present, we have a dependency on it. But since init is fixed and given by the loop for all mem_refs
>   //     of all packs, we can drop it.
>   //   - If init is a constant, we do not have another dependency.
>   //   - If init is a non-constant, we have a dependency on scale since C_init = scale. We need to keep since it is not 
>   //     given by the loop and could vary from pack to pack. We do not have to add another dependency since (5a*):
>   //
>   //       C_init % abs(C_pre) = 0   <=>   C_init = C_pre * X
>   //
>   //     can only be satisfied if X is constant:
>   //
>   //       C_init  = C_pre * X
>   //       X = C_init / C_pre
>   //       X = scale / (scale * pre_stride)
>   //       X = 1 / pre_stride
>   //
>   //   - If an invariant is present, we have a dependency on it which we need to keep since it is not given by the loop
>   //     and could vary from pack to pack. We additionally have to add another dependency on scale since (5b*):
>   //
>   //       C_invar % abs(C_pre) = 0  <=>   C_invar = C_pre * Y
>   //
>   //     can only be satisfied if we have the following Y:
>   //
>   //       C_invar = C_pre * Y
>   //       Y = (C_invar / C_pre)
>   //       Y = abs(invar_factor) / (scale * pre_stride)
>   //
>   //     A dependency for pre_stride is not required as it is fixed and given by the loop for all mem_refs of all packs
>   //     (similar to init).

Another thought: Should we assert that if init is a non-constant that `abs(stride) = 1` if there is a solution?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1427694498
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1427819252