RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59]
Christian Hagedorn
chagedorn at openjdk.org
Mon Jan 8 14:41:47 UTC 2024
On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I want to push this in JDK23.
>> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620).
>>
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>>
>> **Context**
>>
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>>
>> Alignment is split into two tasks:
>> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>>
>> **Problem**
>>
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>>
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>>
>> **Problem Details**
>>
>> Reproducer:
>>
>>
>> static void test(short[] a, short[] b, short mask) {
>> for (int i = 0; i < RANGE; i+=8) {
>> // Problematic for AlignVector
>> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0
>>
>> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes
>> b[i+4] = (short)(a[i+4] & mask);
>> b[i+5] = (short)(a[i+5] & mask);
>> b[i+6] = (short)(a[i+6] & mask);
>> }
>> }
>>
>>
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>>
>> This is problemati...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> some minor changes for Vladimir
Some last minor comments. Otherwise, looks good!
src/hotspot/share/opto/chaitin.cpp line 1794:
> 1792: Node* PhaseChaitin::find_base_for_derived(Node** derived_base_map, Node* derived, uint& maxlrg) {
> 1793: // See if already computed; if so return it
> 1794: if(derived_base_map[derived->_idx]) {
Suggestion:
if (derived_base_map[derived->_idx]) {
src/hotspot/share/opto/superword.cpp line 1620:
> 1618:
> 1619: const MemNode* mem_ref = pack->at(0)->as_Mem();
> 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false);
Since you renamed `p` -> `pack`, you should also rename this one to pack:
Suggestion:
VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false);
src/hotspot/share/opto/superword.cpp line 1630:
> 1628: mem_ref_p.invar(),
> 1629: mem_ref_p.invar_factor(),
> 1630: mem_ref_p.scale_in_bytes(),
Suggestion:
AlignmentSolver solver(pack->at(0)->as_Mem(),
pack->size(),
mem_ref_pack.base(),
mem_ref_pack.offset_in_bytes(),
mem_ref_pack.invar(),
mem_ref_pack.invar_factor(),
mem_ref_pack.scale_in_bytes(),
src/hotspot/share/opto/superword.cpp line 1702:
> 1700: if (current->is_constrained()) {
> 1701: // Solution is constrained (not trivial)
> 1702: // -> must change pre-limit to acheive alignment
Suggestion:
// -> must change pre-limit to achieve alignment
src/hotspot/share/opto/vectorization.cpp line 756:
> 754: // We describe the 6 terms:
> 755: // 1) The "base" of the address is the address of a Java object (e.g. array),
> 756: // and as such ObjectAlignmentInBytes (a power of 2) aligned. We have
Suggestion:
// and as such ObjectAlignmentInBytes (a power of 2) aligned. We have
src/hotspot/share/opto/vectorization.cpp line 934:
> 932: //
> 933: // Hence, pre_iter_C_const has a non-trivial (because x > 1) periodic (periodicity x)
> 934: // solution, i.e it has a constrained solution.
Suggestion:
// solution, i.e. it has a constrained solution.
src/hotspot/share/opto/vectorization.cpp line 947:
> 945: // (C_const + C_pre * pre_iter_C_const) % aw != 0
> 946: //
> 947: // This is in constradiction with (4a), and therefore there cannot be any solution,
Suggestion:
// This is in contradiction with (4a), and therefore there cannot be any solution,
src/hotspot/share/opto/vectorization.cpp line 1038:
> 1036: // sign(C_pre) = C_pre / abs(C_pre) = (C_pre > 0) ? 1 : -1, (7)
> 1037: //
> 1038: // We know that abs(C_pre) as well as aw are a powers of 2, and since (5) we can define integer q:
Suggestion:
// We know that abs(C_pre) as well as aw are powers of 2, and since (5) we can define integer q:
-------------
Marked as reviewed by chagedorn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1809074856
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444615477
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624179
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624796
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444632226
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444680843
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444699344
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444700628
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444711822
More information about the hotspot-compiler-dev
mailing list