RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v2]
Vladimir Kozlov
kvn at openjdk.org
Thu Feb 23 19:57:20 UTC 2023
On Tue, 21 Feb 2023 08:26:59 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> The loop that doesn't vectorize is:
>>
>>
>> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
>> for (int i = start; i < stop; i++) {
>> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
>> }
>> }
>>
>>
>> It's from a micro-benchmark in the panama
>> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
>> because it finds it cannot properly align the loop and, from the
>> comment in the code, that:
>>
>>
>> // Can't allow vectorization of unaligned memory accesses with the
>> // same type since it could be overlapped accesses to the same array.
>>
>>
>> The test for "same type" is implemented by looking at the memory
>> operation type which in this case is overly conservative as the loop
>> above is reading and writing with long loads/stores but from and to
>> arrays of different types that can't overlap. Actually, with such
>> mismatched accesses, it's also likely an incorrect test (reading and
>> writing could be to the same array with loads/stores that use
>> different operand size) eventhough I couldn't write a test case that
>> would trigger an incorrect execution.
>>
>> As a fix, I propose implementing the "same type" test by looking at
>> memory aliases instead.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>
> - comments
> - extra test
> - more
> - Merge branch 'master' into JDK-8300258
> - review
> - more
> - fix & test
And:
> I found an other problem. My new regression test TestShortVect failed in the overlapping case
> (load and store from the same array with offset). Superword code thinks store into short[] array
> is char store and different from short load from short[] so it allows overlapping. It happened
> because C2 has only one StoreC node for 2 bytes stores but different LoadS and LoadUS
> nodes for loads. And these load nodes has different memory type: T_SHORT and T_CHAR.
I found that I can't use memory_type() because some ideal transformations (in mulnode.cpp)
may replace load from char[] with short load and reverse. So instead of comparing vector types
I added new method same_velt_type(n1, n2) which compare element size for integer types.
Also there is no difference in vector instructions for Char and Short types so I removed Char
related vector nodes and used Short vectors for Char type.
-------------
PR: https://git.openjdk.org/jdk/pull/12440
More information about the hotspot-compiler-dev
mailing list