RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v50]
Christian Hagedorn
chagedorn at openjdk.org
Wed Dec 20 15:33:09 UTC 2023
On Wed, 20 Dec 2023 10:23:20 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>>
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>>
>> **Context**
>>
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>>
>> Alignment is split into two tasks:
>> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>>
>> **Problem**
>>
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>>
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>>
>> **Problem Details**
>>
>> Reproducer:
>>
>>
>> static void test(short[] a, short[] b, short mask) {
>> for (int i = 0; i < RANGE; i+=8) {
>> // Problematic for AlignVector
>> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>>
>> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>> b[i+4] = (short)(a[i+4] & mask);
>> b[i+5] = (short)(a[i+5] & mask);
>> b[i+6] = (short)(a[i+6] & mask);
>> }
>> }
>>
>>
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>>
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> necessary and sufficient (3) <-> (4)
src/hotspot/share/opto/vectorization.cpp line 1145:
> 1143:
> 1144: #ifdef ASSERT
> 1145: void print_icon_or_idx(const Node* n) {
Or coni:
Suggestion:
void print_con_or_idx(const Node* n) {
src/hotspot/share/opto/vectorization.cpp line 1252:
> 1250: tty->print_cr(" -> %s", state_to_str(eq4a_state()));
> 1251:
> 1252: tty->print_cr(" EQ(4a): (C_invar(%3d) * var_invar + C_pre(%d) * pre_iter_C_invar) %% aw(%d) = 0 (align invar term individually)",
Suggestion:
tty->print_cr(" EQ(4b): (C_invar(%3d) * var_invar + C_pre(%d) * pre_iter_C_invar) %% aw(%d) = 0 (align invar term individually)",
src/hotspot/share/opto/vectorization.cpp line 1256:
> 1254: tty->print_cr(" -> %s", state_to_str(eq4b_state()));
> 1255:
> 1256: tty->print_cr(" EQ(4a): (C_init( %3d) * var_init + C_pre(%d) * pre_iter_C_init ) %% aw(%d) = 0 (align init term individually)",
Suggestion:
tty->print_cr(" EQ(4c): (C_init( %3d) * var_init + C_pre(%d) * pre_iter_C_init ) %% aw(%d) = 0 (align init term individually)",
src/hotspot/share/opto/vectorization.hpp line 422:
> 420: // [- init / pre_stride ]
> 421: //
> 422: // Note: pre_stride and init are idential for all mem_refs in the loop.
Suggestion:
// Note: pre_stride and init are identical for all mem_refs in the loop.
src/hotspot/share/opto/vectorization.hpp line 425:
> 423: //
> 424: // The init alignment term either does not exist for both mem_refs, or exists identically
> 425: // for both. The init alignment term is thus triviall identical.
Suggestion:
// for both. The init alignment term is thus trivially identical.
src/hotspot/share/opto/vectorization.hpp line 459:
> 457: //
> 458: // Since q1 and q2 are both powers of 2, and q1 <= q2, we know there
> 459: // is an integer a: a * q1 = q1. Thus, it remains to check if there
Suggestion:
// is an integer a: a * q1 = q2. Thus, it remains to check if there
src/hotspot/share/opto/vectorization.hpp line 484:
> 482:
> 483: // When strict alignment is required (e.g. -XX:+AlignVector), then we must ensure
> 484: // that all vector memory accesses can be aligned. We acheive this alignment by
Suggestion:
// that all vector memory accesses can be aligned. We achieve this alignment by
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432806809
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432810574
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432810700
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432821557
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432821841
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432849340
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432862529
More information about the hotspot-compiler-dev
mailing list