RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v51]

Christian Hagedorn chagedorn at openjdk.org
Thu Dec 21 14:07:10 UTC 2023


On Wed, 20 Dec 2023 15:13:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>> 
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>> 
>> **Context**
>> 
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>> 
>> Alignment is split into two tasks:
>>  - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>>  - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>> 
>> **Problem**
>> 
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>> 
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>> 
>> **Problem Details**
>> 
>> Reproducer:
>> 
>> 
>>     static void test(short[] a, short[] b, short mask) {
>>         for (int i = 0; i < RANGE; i+=8) {
>>             // Problematic for AlignVector
>>             b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>> 
>>             b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>>             b[i+4] = (short)(a[i+4] & mask);
>>             b[i+5] = (short)(a[i+5] & mask);
>>             b[i+6] = (short)(a[i+6] & mask);
>>         }
>>     }
>> 
>> 
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>> 
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply suggestions from code review by Christian
>   
>   Thanks Christian
>   
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

src/hotspot/share/opto/superword.cpp line 3465:

> 3463: }
> 3464: 
> 3465: // Ensure that the main loop vectors are aligned by adjusting the pre loop limit. We memory align

Suggestion:

// Ensure that the main loop vectors are aligned by adjusting the pre loop limit. We memory-align

src/hotspot/share/opto/superword.cpp line 3491:

> 3489:   // For the main-loop, we want the address of align_to_ref to be memory aligned
> 3490:   // with some alignment width (aw, a power of 2). When we enter the main-loop,
> 3491:   // we know that iv is equals to the pre-loop limit. If we adjust the pre-loop

Suggestion:

  // we know that iv is equal to the pre-loop limit. If we adjust the pre-loop

src/hotspot/share/opto/superword.cpp line 3540:

> 3538:   // alignment is required (i.e. -XX:+AlignVector), this is guaranteed by the filtering
> 3539:   // done with the AlignmentSolver / AlignmentSolution. If strict alignment is not
> 3540:   // required, then alignment is still preferrable for performance, but not necessary.

Suggestion:

  // required, then alignment is still preferable for performance, but not necessary.

src/hotspot/share/opto/superword.cpp line 3557:

> 3555:   //   where: sign(scale) = scale / abs(scale) = (scale > 0 ? 1 : -1)
> 3556:   //
> 3557:   // Note, (9) allows for periodic solutons of adjust_pre_iter, with periodicity AW.

Suggestion:

  // Note, (9) allows for periodic solutions of adjust_pre_iter, with periodicity AW.

src/hotspot/share/opto/vectorization.cpp line 706:

> 704: }
> 705: #endif
> 706: 

Remove extra new line:
Suggestion:

src/hotspot/share/opto/vectorization.hpp line 370:

> 368:   const int _r = 0;
> 369:   const Node* _invar = nullptr;
> 370:   const int _scale = 0;

You don't need to set an initial value here as you always set the fields in the constructor.

src/hotspot/share/opto/vectorization.hpp line 516:

> 514: //
> 515: // For each vector memory access, we can find the set of pre_iter (number of pre-loop
> 516: // iterations) which would align its address. The AlignmentSolver finds such a

Suggestion:

// iterations) which would align its address. The AlignmentSolver finds such an

src/hotspot/share/opto/vectornode.hpp line 1042:

> 1040: };
> 1041: 
> 1042: // Verify that memory address (adr) is alignemd. The mask specifies the

Suggestion:

// Verify that memory address (adr) is aligned. The mask specifies the

src/hotspot/share/opto/vectornode.hpp line 1056:

> 1054: public:
> 1055:   VerifyVectorAlignmentNode(Node* adr, Node* mask) : Node(nullptr, adr, mask) {}
> 1056: public:

Repeated and can be removed:
Suggestion:

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433802873
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433804787
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433884021
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433942138
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433783372
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433789388
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433794623
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433799981
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1433800962


More information about the hotspot-compiler-dev mailing list