RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v2]
Emanuel Peter
epeter at openjdk.org
Thu Jun 8 08:56:19 UTC 2023
> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler.
>
> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior.
>
> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`.
>
> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses.
>
> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below).
>
> **Changes to Tests**
> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag.
>
> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti...
Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:
- IR whitelist AlignVector, require it false in the newly added tests
- Merge branch 'master' into JDK-8308606
- Merge branch 'master' into JDK-8308606
- remove some outdated comments
- Benchmark VectorAlignment
- Merge branch 'master' into JDK-8308606
- remove dead code and add offset printing
- fix typo
- 8308606: C2 SuperWord: remove alignment checks where not required
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/14096/files
- new: https://git.openjdk.org/jdk/pull/14096/files/7333d115..b00a7760
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=00-01
Stats: 60053 lines in 911 files changed: 47816 ins; 7685 del; 4552 mod
Patch: https://git.openjdk.org/jdk/pull/14096.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096
PR: https://git.openjdk.org/jdk/pull/14096
More information about the hotspot-compiler-dev
mailing list