RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v11]
Emanuel Peter
epeter at openjdk.org
Wed Nov 15 08:25:38 UTC 2023
On Wed, 8 Nov 2023 14:20:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> To calm your nerves: most of the changes are in auto-generated tests, and tests in general.
>>
>> **Context**
>>
>> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs).
>>
>> Alignment is split into two tasks:
>> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`.
>> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width.
>>
>> **Problem**
>>
>> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190).
>> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine.
>> Thanks @fg1417 for confirming this!
>> Hence, we need to fix the alignment correctness checks.
>>
>> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly.
>>
>> **Problem Details**
>>
>> Reproducer:
>>
>>
>> static void test3(byte[] a, byte[] b, byte mask) {
>> for (int i = 0; i < RANGE; i+=8) {
>> // Problematic for AlignVector
>> b[i+0] = (byte)(a[i+0] & mask); // best_memref, align 0
>>
>> b[i+3] = (byte)(a[i+3] & mask); // pack at offset 6 bytes
>> b[i+4] = (byte)(a[i+4] & mask);
>> b[i+5] = (byte)(a[i+5] & mask);
>> b[i+6] = (byte)(a[i+6] & mask);
>> }
>> }
>>
>>
>> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all.
>>
>> This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and byte offset `0, 6, 8, 10, 12`. While the reference at ...
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 62 commits:
>
> - Merge branch 'master' into JDK-8311586
> - fix flags register in VerifyAlignVector
> - Merge branch 'master' into JDK-8311586
> - small fix
> - Merge branch 'master' into JDK-8311586
> - add comments for must_verify_alignment
> - fixed up verify_vector_alignment in ad file
> - rename VerifyAlignment with VerifyVectorAlignment
> - replace ifdef with Matcher::has_match_rule
> - fix whitespace
> - ... and 52 more: https://git.openjdk.org/jdk/compare/7c7f8ea3...6e7aed27
@bulasevich @snazarkin Can you please run this on arm32 and tell me if I broke anything? @fg1417 already kindly ran it, and found some failures. I don't have access to such a machine. This fix here is also for arm32 (my test case SIGBUs-es on arm32 with master). @fg1417 thought it might well be a backend issue. It would be good to confirm that, or else I have to figure out what I'm doing wrong in this patch.
It seems all other platforms are passing.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1811999391
More information about the hotspot-compiler-dev
mailing list