RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v49]

Christian Hagedorn chagedorn at openjdk.org
Wed Dec 20 14:39:05 UTC 2023


On Tue, 19 Dec 2023 14:31:17 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>> 
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>> 
>> **Context**
>> 
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>> 
>> Alignment is split into two tasks:
>>  - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>>  - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>> 
>> **Problem**
>> 
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>> 
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>> 
>> **Problem Details**
>> 
>> Reproducer:
>> 
>> 
>>     static void test(short[] a, short[] b, short mask) {
>>         for (int i = 0; i < RANGE; i+=8) {
>>             // Problematic for AlignVector
>>             b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>> 
>>             b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>>             b[i+4] = (short)(a[i+4] & mask);
>>             b[i+5] = (short)(a[i+5] & mask);
>>             b[i+6] = (short)(a[i+6] & mask);
>>         }
>>     }
>> 
>> 
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>> 
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   improve comments with fractional / integer question

src/hotspot/share/opto/vectorization.cpp line 741:

> 739:   // main_iter:   number of main-loop iterations (main_iter >= 0)
> 740:   //
> 741:   // In the following, we restate the simple form of the address expression, by first

You might want to have "Simple" in upper case as you introduce it above as a term: "The Simple form of the address.."
Suggestion:

  // In the following, we restate the Simple form of the address expression, by first

src/hotspot/share/opto/vectorization.cpp line 756:

> 754:   // We describe the 6 terms:
> 755:   //   1) The "base" of the address is the address of a Java object (e.g. array),
> 756:   //      and hence can be assumed to already be aw-aligned (base % aw = 0).

IIRC, you've had an explanation of why that is the case in an earlier version somewhere above. But now that you've moved the definition of `_aw` to the constructor, it might be good to restate here that objects are at least `ObjectAlignmentInBytes` aligned and that aw is a power of two and at most `ObjectAlignmentInBytes` by definition which implies `base % aw = 0`. It might not be evidently clear otherwise.

src/hotspot/share/opto/vectorization.cpp line 758:

> 756:   //      and hence can be assumed to already be aw-aligned (base % aw = 0).
> 757:   //   2) The "C_const" term is the sum of all constant terms. This is "offset",
> 758:   //      plus "init" if it is constant.

Should we also mention here `scale` in case `init` is constant?
Suggestion:

  //   2) The "C_const" term is the sum of all constant terms. This is "offset",
  //      plus "scale * init" if it is constant.

src/hotspot/share/opto/vectorization.cpp line 762:

> 760:   //      and variable term. If there is no invariant, then "C_invar" is zero.
> 761:   //
> 762:   //        invar = C_invar * var_invar                                                                          (FAC_INVAR)

I suggest to move `(FAC_INVAR)` and `(FAC_INIT`) a little bit closer to the left.

src/hotspot/share/opto/vectorization.cpp line 827:

> 825:   //
> 826:   // While we can now attribute the (fractional) amount of iterations required for the C_const,
> 827:   // invar and init terms, this does not give us a way to align these terms independendly.

Suggestion:

  // invar and init terms, this does not give us a way to align these terms independently.

src/hotspot/share/opto/vectorization.hpp line 103:

> 101: 
> 102:   // Biggest detectable factor of the invariant.
> 103:   int   invar_factor();

Was like that before but you could make them all `const`.

src/hotspot/share/opto/vectorization.hpp line 575:

> 573:       _mem_ref(           mem_ref),
> 574:       _vector_length(     vector_length),
> 575:       _element_size(      mem_ref->memory_size()),

You should assert here that `mem_ref` is non-null.

src/hotspot/share/opto/vectorization.hpp line 598:

> 596:   class EQ4 {
> 597:     private:
> 598:       const int _C_const;

Nit: Members should be indented by two spaces. I then usually indent the `private/public` keyword with one space.

src/hotspot/share/opto/vectorization.hpp line 630:

> 628:       int C_const_mod_abs_C_pre() const { return AlignmentSolution::mod(_C_const, abs(_C_pre)); }
> 629:       int C_invar_mod_abs_C_pre() const { return AlignmentSolution::mod(_C_invar, abs(_C_pre)); }
> 630:       int C_init__mod_abs_C_pre() const { return AlignmentSolution::mod(_C_init,  abs(_C_pre)); }

Two consecutive `_`:
Suggestion:

      int C_init_mod_aw() const        { return AlignmentSolution::mod(_C_init,  _aw); }
      int C_const_mod_abs_C_pre() const { return AlignmentSolution::mod(_C_const, abs(_C_pre)); }
      int C_invar_mod_abs_C_pre() const { return AlignmentSolution::mod(_C_invar, abs(_C_pre)); }
      int C_init_mod_abs_C_pre() const { return AlignmentSolution::mod(_C_init,  abs(_C_pre)); }

src/hotspot/share/opto/vectorization.hpp line 635:

> 633: #ifdef ASSERT
> 634:       void trace() const;
> 635:       const char* state_to_str(State s) const {

Could be made static

src/hotspot/share/opto/vectorization.hpp line 663:

> 661:                                   const int C_pre,
> 662:                                   const int q,
> 663:                                   const int r) const;

I think you can remove `const` from the declaration here in the header file as it only has an effect in the definition in the source file.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432382602
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432387018
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432392449
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432442598
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432443208
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432376331
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432367404
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432366482
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432369245
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432370031
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1432371196


More information about the hotspot-compiler-dev mailing list