RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6]
Vladimir Kozlov
kvn at openjdk.org
Tue Jun 20 16:10:15 UTC 2023
On Wed, 14 Jun 2023 11:13:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler.
>>
>> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior.
>>
>> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`.
>>
>> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses.
>>
>> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below).
>>
>> **Changes to Tests**
>> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag.
>>
>> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for...
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision:
>
> - Merge branch 'master' into JDK-8308606
> - bench000: add other type examples
> - bench100: added versions for more types (misaligned load store)
> - Add vm.flagless back in for LoopArrayIndexComputeTest.java
> - removed AlignVector from IR framework again, do that in RFE
> - IR whitelist AlignVector, require it false in the newly added tests
> - Merge branch 'master' into JDK-8308606
> - Merge branch 'master' into JDK-8308606
> - remove some outdated comments
> - Benchmark VectorAlignment
> - ... and 4 more: https://git.openjdk.org/jdk/compare/b7339abc...0740b7bc
Nice re-write. Looks good to me.
-------------
Marked as reviewed by kvn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14096#pullrequestreview-1488512253
More information about the hotspot-compiler-dev
mailing list