RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v40]
Christian Hagedorn
chagedorn at openjdk.org
Fri Dec 15 10:50:24 UTC 2023
On Thu, 7 Dec 2023 15:15:13 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>>
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>>
>> **Context**
>>
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>>
>> Alignment is split into two tasks:
>> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>>
>> **Problem**
>>
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>>
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>>
>> **Problem Details**
>>
>> Reproducer:
>>
>>
>> static void test(short[] a, short[] b, short mask) {
>> for (int i = 0; i < RANGE; i+=8) {
>> // Problematic for AlignVector
>> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>>
>> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>> b[i+4] = (short)(a[i+4] & mask);
>> b[i+5] = (short)(a[i+5] & mask);
>> b[i+6] = (short)(a[i+6] & mask);
>> }
>> }
>>
>>
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>>
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> remove dead code
src/hotspot/share/opto/c2_globals.hpp line 97:
> 95: \
> 96: develop(bool, VerifyAlignVector, false, \
> 97: "Check that vector stores/loads are aligned if AlignVector is on.") \
Nit: Do we need to wrap the line here such that the `` is aligned again?
src/hotspot/share/opto/superword.cpp line 591:
> 589: // We can find adjacent memory references by comparing their relative
> 590: // alignment. If the final vectors can be aligned can not yet be determined,
> 591: // that is only done once all vectors are extended and combined.
Suggestion:
// alignment. Whether the final vectors can be aligned is determined later
// once all vectors are extended and combined.
src/hotspot/share/opto/vectorization.cpp line 772:
> 770: // during "main_iter" main-loop iterations.
> 771:
> 772: // Attribute init either to C_const or to C_init term.
Suggestion:
// Attribute init (i.e. _init_node) either to C_const or to C_init term.
src/hotspot/share/opto/vectorization.cpp line 779:
> 777: const int C_invar = (_invar == nullptr) ? 0 : abs(_invar_factor);
> 778:
> 779: const int C_const = _offset + C_const_init * _scale;
Nit: I suggest to move this line up to `C_const_init` and follow the structure in the comment above:
// Attribute init (i.e. _init_node) either to C_const or to C_init term.
const int C_const_init = _init_node->is_ConI() ? _init_node->as_ConI()->get_int() : 0;
const int C_const = _offset + C_const_init * _scale;
// Set C_invar depending on if invar is present
const int C_invar = (_invar == nullptr) ? 0 : abs(_invar_factor);
const int C_init = _init_node->is_ConI() ? 0 : _scale;
const int C_pre = _scale * _pre_stride;
const int C_main = _scale * _main_stride;
src/hotspot/share/opto/vectorization.cpp line 900:
> 898: // Otherwise, if abs(C_pre) < aw, we find all solutions for pre_iter_C_const in (4c).
> 899: // We state pre_iter_C_const in terms of the smallest possible pre_q and pre_r, such
> 900: // that pre_q >= 0 and 0 <= pre_r < pre_q:
I think you can remove the first inequation as it is implied by the second:
Suggestion:
// that 0 <= pre_r < pre_q:
src/hotspot/share/opto/vectorization.cpp line 906:
> 904: // We can now restate (4c) with (7):
> 905: //
> 906: // (C_const + C_pre * pre_r + C_pre * pre_q * m) % aw = 0 (8)
You might want to repeat 4c here to better follow the transformation. For example:
Suggestion:
// (C_const + C_pre * pre_iter_C_const) % aw = 0 (4c)
// (C_const + C_pre * (pre_r + pre_q * m) % aw = 0 (applying (7))
// (C_const + C_pre * pre_r + C_pre * pre_q * m) % aw = 0 (8)
src/hotspot/share/opto/vectorization.cpp line 915:
> 913: // Given that abs(C_pre) is a powers of 2, and abs(C_pre) < aw:
> 914: //
> 915: const int pre_q = _aw / abs(C_pre);
Suggestion:
const int pre_q = _aw / abs(C_pre);
src/hotspot/share/opto/vectorization.hpp line 572:
> 570:
> 571: private:
> 572: #ifdef ASSERT
Since the `Tracer` class in this file is `NOT_PRODUCT`, I suggest to also go with that here and where the trace methods are used.
Suggestion:
#ifndef PRODUCT
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1423754881
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1420583333
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1424172989
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1424166141
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1424281102
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1424285379
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1424287409
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1420602287
More information about the hotspot-compiler-dev
mailing list