RFR: 8298935: fix cyclic dependency bug in create_pack logic in SuperWord::find_adjacent_refs
Tobias Hartmann
thartmann at openjdk.org
Fri Feb 17 07:57:22 UTC 2023
On Tue, 31 Jan 2023 18:26:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> Cyclic dependencies are not handled correctly in all cases. Three examples:
>
> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java#L270-L277
>
> And this, compiled with `-XX:CompileCommand=option,compiler.vectorization.TestOptionVectorizeIR::test*,Vectorize`:
> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/vectorization/TestOptionVectorizeIR.java#L173-L180
>
> And for `vmIntrinsics::_forEachRemaining` compile option `Vectorize` is always enabled:
> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/vectorization/TestForEachRem.java#L69-L73
>
> All of these examples are vectorized, despite the cyclic dependency of distance 2. The cyclic dependency is dropped, instead the emitted vector code implements a shift by 2, instead of repeating the same 2 values.
>
> **Analysis**
>
> The `create_pack` logic in `SuperWord::find_adjacent_refs` is broken in two ways:
>
> - When the compile directive `Vectorize` is on, or we compile `vmIntrinsics::_forEachRemaining` we have `_do_vector_loop == true`. When that is the case, we blindly trust that there is no cyclic dependency larger than distance 1. Distance 1 would already be detected by the `independence(s1, s2)` checks we do for all adjacent memops. But for larger distances, we rely on `memory_alignment == 0`. But the compile directive avoids these checks.
> - If `best_align_to_mem_ref` is of a different type, and we have `memory_alignment(mem_ref, best_align_to_mem_ref) == 0`, we do not check if `mem_ref` has `memory_alignment == 0` for all other refs of the same type. In the example `TestCyclicDependency::test2`, we have `best_align_to_mem_ref` as the `StoreF`. Then we assess the `StoreI`, which is not aligned with it, but it is of a different type, so we accept it too. Finally, we look at `LoadI`, which has perfect alignment with the `StoreF`, so we accept it too (even though it is in conflict with the `StoreI`).
>
> Generally, the nested if-statements are confusing and buggy. I propose to fix and refactor the code.
>
> I also propose to only allow the compile directive `Vectorize` only if `vectors_should_be_aligned() == false`. If all vector operations have to be `vector_width` aligned, then they also have to be mutually aligned, and we cannot have patterns like `v[i] = v[i] + v[i+1]` for which the compile directive was introduced in the first place https://github.com/openjdk/jdk/commit/c7d33de202203b6da544f2e0f9a13952381b32dd.
>
> **Solution**
>
> First, I implemented `SuperWord::verify_packs` which catches cyclic dependencies just before scheduling. The idea is to reassess every pack, and check if all memops in it are mutually independent. Turns out that per vector pack, it suffices to do a single BFS over the nodes in the block (see `SuperWord::find_dependence`). With this verification in place we at least get an assert instead of wrong execution.
>
> I then refactored and fixed the `create_pack` code, and put the logic all in `SuperWord::is_mem_ref_alignment_ok`. With the added comments, I hope the logic is more straight forward and readable. If `_do_vector_loop == true`, then I filter the vector packs again in `SuperWord::combine_packs`, since we are at that point not sure that the packs are actually independent, we only know that adjacient memops are independent.
>
> Another change I have made:
> Disallow `extend_packlist` from adding `MemNodes` back in. Because if we have rejected some memops, we do not want them to be added back in later.
>
> **Testing**
>
> I added a few more regression tests, and am running tier1-3, plus some stress testing.
>
> However, I need help from someone who can test this on **ARM32** and **PPC**, basically machines that have `vectors_should_be_aligned() == false`. I would love to have additional testing on those machine, and some reviews.
>
> **Discussion / Future Work**
>
> I wonder if we should have `_do_vector_loop == true` by default, since it allows more vectorization. With the added filtering, we are sure that we do not schedule packs with cyclic dependencies. We would have to evaluate performance and other side-effects of course. What do you think?
The new code in `SuperWord::is_mem_ref_alignment_ok` is much easier to read!
> I also propose to only allow the compile directive Vectorize only if vectors_should_be_aligned() == false. If all vector operations have to be vector_width aligned, then they also have to be mutually aligned, and we cannot have patterns like v[i] = v[i] + v[i+1] for which the compile directive was introduced in the first place https://github.com/openjdk/jdk/commit/c7d33de202203b6da544f2e0f9a13952381b32dd.
Couldn't this lead to cases where `vmIntrinsics::_forEachRemaining` is not vectorized anymore on these platforms although it would be legal?
> I wonder if we should have _do_vector_loop == true by default, since it allows more vectorization. With the added filtering, we are sure that we do not schedule packs with cyclic dependencies. We would have to evaluate performance and other side-effects of course. What do you think?
I think it's a good idea to investigate this separately. Please file a RFE.
Please run a full set of performance testing to make sure these changes don't accidentally introduce a regression.
src/hotspot/share/opto/superword.cpp line 1849:
> 1847: for (int i = 0; i < _packset.length(); i++) {
> 1848: Node_List* p = _packset.at(i);
> 1849: if (p != NULL) {
There are some more occurences in your changes.
Suggestion:
if (p != nullptr) {
test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 260:
> 258: static void test1(int[] dataI, float[] dataF) {
> 259: for (int i = 0; i < RANGE - 1; i++) {
> 260: // dataI has cyclid dependency of distance 1, cannot vectorize
Suggestion:
// dataI has cyclic dependency of distance 1, cannot vectorize
test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 272:
> 270: static void test2(int[] dataI, float[] dataF) {
> 271: for (int i = 0; i < RANGE - 2; i++) {
> 272: // dataI has cyclid dependency of distance 2, cannot vectorize
Suggestion:
// dataI has cyclic dependency of distance 2, cannot vectorize
test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 284:
> 282: static void test3(int[] dataI, float[] dataF) {
> 283: for (int i = 0; i < RANGE - 3; i++) {
> 284: // dataI has cyclid dependency of distance 3, cannot vectorize
Suggestion:
// dataI has cyclic dependency of distance 3, cannot vectorize
test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 296:
> 294: static void test4(int[] dataI, float[] dataF) {
> 295: for (int i = 1; i < RANGE - 1; i++) {
> 296: // dataI has cyclid dependency of distance 2, cannot vectorize
Suggestion:
// dataI has cyclic dependency of distance 2, cannot vectorize
test/hotspot/jtreg/compiler/vectorization/TestOptionVectorizeIR.java line 184:
> 182: @Test
> 183: @IR(counts = {IRNode.LOAD_VECTOR, "= 0"})
> 184: @IR(counts = {IRNode.STORE_VECTOR, "= 0"})
All of these (also in other tests) should be replaced by something like:
Suggestion:
@IR(failOn = {IRNode.LOAD_VECTOR, IRNode.STORE_VECTOR})
-------------
Changes requested by thartmann (Reviewer).
PR: https://git.openjdk.org/jdk/pull/12350
More information about the hotspot-compiler-dev
mailing list