RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v21]
Christian Hagedorn
chagedorn at openjdk.org
Wed Dec 6 12:40:20 UTC 2023
On Mon, 4 Dec 2023 12:42:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>>
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>>
>> **Context**
>>
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>>
>> Alignment is split into two tasks:
>> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>>
>> **Problem**
>>
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>>
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>>
>> **Problem Details**
>>
>> Reproducer:
>>
>>
>> static void test(short[] a, short[] b, short mask) {
>> for (int i = 0; i < RANGE; i+=8) {
>> // Problematic for AlignVector
>> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>>
>> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>> b[i+4] = (short)(a[i+4] & mask);
>> b[i+5] = (short)(a[i+5] & mask);
>> b[i+6] = (short)(a[i+6] & mask);
>> }
>> }
>>
>>
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>>
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> Suggestions by Christian for naming
src/hotspot/share/opto/superword.cpp line 1628:
> 1626: int element_size = mem_ref->memory_size();
> 1627: int vw = pack_size * element_size; // vector_width
> 1628: int aw = MIN2(vw, ObjectAlignmentInBytes); // alignment_width
Is there a specific reason to go with `vw` and `aw` instead of directly naming the variable `vector_width` and `alighment_width`?
src/hotspot/share/opto/superword.cpp line 1643:
> 1641: Node* base = mem_ref_p.base();
> 1642: Node* invar = mem_ref_p.invar();
> 1643: int invar_factor = mem_ref_p.invar_factor();
I suggest to declare all constants `const`. Same for the constants further down in this method.
src/hotspot/share/opto/superword.cpp line 1678:
> 1676: tty->print_cr(" + scale(%d) * iv", scale);
> 1677: }
> 1678: #endif
Not sure if you are planning to refactor the tracing code anyway but you might want to think about extracting any tracing code in this method to separate methods to not disrupt the readability of the code. You can derive some of the variables again (e.g. from VPointer etc.) to avoid having to pass many arguments to the tracing function.
It might even be cleaner if this entire method would be part of `AlignmentSolution`. Then you do not need to pass around all the information to separate tracing methods.
src/hotspot/share/opto/superword.cpp line 1684:
> 1682: return AlignmentSolution("non power-of-2 stride not supported");
> 1683: }
> 1684: assert(is_power_of_2(abs(pre_stride)), "pre_stride is power of 2");
Since you've just checked this condition with a bailout, you can probably get rid of this redundant assertion.
src/hotspot/share/opto/superword.cpp line 1693:
> 1691: }
> 1692:
> 1693: // We analyze the adress of the mem_ref. The idea is to disassemble it into a linear
Suggestion:
// We analyze the address of mem_ref. The idea is to disassemble it into a linear
src/hotspot/share/opto/superword.cpp line 1707:
> 1705: // init: value before pre-loop
> 1706: // pre_stride: increment per pre-loop iteration
> 1707: // pre_iter: number of pre-loop iterations (adjustible via pre-loop limit)
Suggestion:
// pre_iter: number of pre-loop iterations (adjustable via pre-loop limit)
src/hotspot/share/opto/superword.cpp line 1709:
> 1707: // pre_iter: number of pre-loop iterations (adjustible via pre-loop limit)
> 1708: // main_stride: increment per main-loop iteration (= pre_stride * unroll_factor)
> 1709: // j: number of main-loop iterations (j >= 0)
Could we also name `j` -> `main_iter` to be consistent with the naming for `pre_iter`?
src/hotspot/share/opto/superword.cpp line 1712:
> 1710: //
> 1711: // In the following, we restate the simple form of the address expression, by first
> 1712: // expanding the iv varialbe. In a second step, we reshape the expression again, and
Suggestion:
// expanding the iv variable. In a second step, we reshape the expression again, and
src/hotspot/share/opto/superword.cpp line 1720:
> 1718: // + offset + offset + C_const (sum of constant terms)
> 1719: // + invar + invar_factor * var_invar + C_invar * var_invar (term for variable init)
> 1720: // / + scale * init + C_init * var_init (term for invariant)
You flipped the comments for the variable init and invariant terms
src/hotspot/share/opto/superword.cpp line 1725:
> 1723: //
> 1724: // We describe the 6 terms:
> 1725: // 1) The "base" of the address is the address of a java object (e.g. array),
Suggestion:
// 1) The "base" of the address is the address of a Java object (e.g. array),
src/hotspot/share/opto/superword.cpp line 1736:
> 1734: // 5) The "C_pre * pre_iter" term represents how much the iv is incremented
> 1735: // during the "pre_iter" many pre-loop iterations. This term can be adjusted
> 1736: // by changing the pre-loop limit. This allows us to adjust the alignment
Maybe add for completeness: by changing the pre-loop limit which defines how many pre-loop iterations are executed.
src/hotspot/share/opto/superword.cpp line 1750:
> 1748: } else {
> 1749: C_init = scale;
> 1750: }
I suggest to make it implicit to better follow the logic:
Suggestion:
if (init_node->is_ConI()) {
C_const_init = init_node->as_ConI()->get_int();
C_init = 0;
} else {
C_const_init = 0
C_init = scale;
}
src/hotspot/share/opto/superword.cpp line 1791:
> 1789:
> 1790: // We must find a pre_iter, such that adr is aw aligned: adr % aw = 0.
> 1791: // Since "base mod aw = 0", we only need to ensure alignment of the other 5 terms:
Maybe you can directly use `%` everywhere instead of writing "mod/modulo".
src/hotspot/share/opto/superword.cpp line 1793:
> 1791: // Since "base mod aw = 0", we only need to ensure alignment of the other 5 terms:
> 1792: //
> 1793: // C_const + C_invar * var_invar + C_init * var_init + C_pre * pre_iter + C_main * j = 0 (modulo aw) (1)
Maybe state the modulo equation explicitly. Same for the other equations further down.
Suggestion:
// (C_const + C_invar * var_invar + C_init * var_init + C_pre * pre_iter + C_main * j) % aw = 0 (1)
src/hotspot/share/opto/superword.cpp line 1795:
> 1793: // C_const + C_invar * var_invar + C_init * var_init + C_pre * pre_iter + C_main * j = 0 (modulo aw) (1)
> 1794: //
> 1795: // Alignment must be maintained over all main-loop iterations, i.e for any j >= 0, we require:
Suggestion:
// Alignment must be maintained over all main-loop iterations, i.e. for any j >= 0, we require:
src/hotspot/share/opto/superword.cpp line 1797:
> 1795: // Alignment must be maintained over all main-loop iterations, i.e for any j >= 0, we require:
> 1796: //
> 1797: // C_main % aw = 0 (2*)
So, if (2*) then we require:
(C_const + C_invar * var_invar + C_init * var_init + C_pre * pre_iter) % aw = 0
to satisfy (1). Maybe you can add that for completeness.
src/hotspot/share/opto/superword.cpp line 1818:
> 1816: }
> 1817:
> 1818: // In what follows, me must ensure that the C_pre term can align the C_const, C_init and C_invar terms,
Suggestion:
// In what follows, we must ensure that the C_pre term can align the C_const, C_init and C_invar terms,
src/hotspot/share/opto/superword.cpp line 1847:
> 1845: // We must now show that the C_const term can be aligned.
> 1846: //
> 1847: // We can assume that abs(C_pre) is a power of 2.
As a reminder, you could also mention here that `aw` is a power of 2.
src/hotspot/share/opto/superword.cpp line 1848:
> 1846: //
> 1847: // We can assume that abs(C_pre) is a power of 2.
> 1848: // If abs(C_pre) >= aw, then for any pre_iter >= 0: C_pre * pre_iter = 0 (mod aw),
Suggestion:
// If abs(C_pre) >= aw, then for any pre_iter >= 0: C_pre * pre_iter % aw = 0,
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413834430
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413836369
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413839023
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1416879986
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413841272
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413862662
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413864497
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413863686
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413875167
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413887209
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1416912014
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413896281
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413936123
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413938614
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413941141
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1413944179
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1414047876
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1416887945
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1416839343
More information about the hotspot-compiler-dev
mailing list