RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop
Emanuel Peter
epeter at openjdk.org
Wed Feb 8 11:52:44 UTC 2023
On Mon, 6 Feb 2023 14:15:19 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> The loop that doesn't vectorize is:
>
>
> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
> for (int i = start; i < stop; i++) {
> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
> }
> }
>
>
> It's from a micro-benchmark in the panama
> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
> because it finds it cannot properly align the loop and, from the
> comment in the code, that:
>
>
> // Can't allow vectorization of unaligned memory accesses with the
> // same type since it could be overlapped accesses to the same array.
>
>
> The test for "same type" is implemented by looking at the memory
> operation type which in this case is overly conservative as the loop
> above is reading and writing with long loads/stores but from and to
> arrays of different types that can't overlap. Actually, with such
> mismatched accesses, it's also likely an incorrect test (reading and
> writing could be to the same array with loads/stores that use
> different operand size) eventhough I couldn't write a test case that
> would trigger an incorrect execution.
>
> As a fix, I propose implementing the "same type" test by looking at
> memory aliases instead.
@rwestrel is there a reason why you only replaced these two instances of `same_velt_type`?
The logic of "same type" is also used later on in the code of `SuperWord::find_adjacent_refs()`.
For example when we have `create_pack == false`, we remove all memops and packs of the "same type". Do you not want to change that too?
For example here:
https://github.com/openjdk/jdk/blob/8f195ff236000d9c019f8beb2b13355083e211b5/src/hotspot/share/opto/superword.cpp#L710-L738
But maybe also this should now be adapted:
https://github.com/openjdk/jdk/blob/8f195ff236000d9c019f8beb2b13355083e211b5/src/hotspot/share/opto/superword.cpp#L1268-L1271
src/hotspot/share/opto/superword.cpp line 665:
> 663: }
> 664: } else {
> 665: if (_phase->C->get_alias_index(mem_ref->adr_type()) == _phase->C->get_alias_index(best_align_to_mem_ref->adr_type())) {
Would it not be nicer to wrap this in a separate function?
`same_memory_slice(mem_ref, best_align_to_mem_ref)`
You can then reuse it everywhere, makes it more readable.
You should also change the comments below, and elsewhere, as it is now not about "same type", but "same memory slice".
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 39:
> 37: */
> 38:
> 39: public class TestVectorizationMismatchedAccess {
Could you have a much simpler example, just with two int-arrays, where one allows vectorization and one does not. We still want there to be some vectorization. Maybe like this, maybe you have to play around with it a bit:
// make sure arr1 and arr2 are in different slices
for (int i ...) {
arr1[i] = arr1[i-2] + 5; // do not vectorize, cyclic dependency
arr2[i] = arr2[i] + 5; // should vectorize
}
-------------
Changes requested by epeter (Committer).
PR: https://git.openjdk.org/jdk/pull/12440
More information about the hotspot-compiler-dev
mailing list