RFR: 8298935: fix independence bug in create_pack logic in SuperWord::find_adjacent_refs [v27]

Emanuel Peter epeter at openjdk.org
Wed Mar 15 14:06:51 UTC 2023


On Mon, 13 Mar 2023 10:37:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **List of important things below**
>> 
>> - 3 Bugs I fixed + regression tests https://github.com/openjdk/jdk/pull/12350#issuecomment-1460323523
>> - Conversation with @jatin-bhateja about script-generated regression test: https://github.com/openjdk/jdk/pull/12350#discussion_r1115317152
>> - My [blog-article](https://eme64.github.io/blog/2023/02/23/SuperWord-Introduction.html) about SuperWord
>> - Explanation of `dependency_graph` and `DepPreds` https://github.com/openjdk/jdk/pull/12350#issuecomment-1461498252
>> - Explanation of my new `find_dependency`. Arguments about `independence` of packs and cyclic dependencies between packs https://github.com/openjdk/jdk/pull/12350#issuecomment-1461681129
>> - I found that 4rth bug, where we have independent packs, but the packs have a cyclic dependency. I will fix it in a separate Bug. https://github.com/openjdk/jdk/pull/12350#issuecomment-1465860465
>> 
>> **Original RFE description:**
>> Cyclic dependencies are not handled correctly in all cases. Three examples:
>> 
>> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java#L270-L277
>> 
>> And this, compiled with `-XX:CompileCommand=option,compiler.vectorization.TestOptionVectorizeIR::test*,Vectorize`:
>> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/vectorization/TestOptionVectorizeIR.java#L173-L180
>> 
>> And for `vmIntrinsics::_forEachRemaining` compile option `Vectorize` is always enabled:
>> https://github.com/openjdk/jdk/blob/0a834cd991a2f94b784ee4abde06825486fcb97f/test/hotspot/jtreg/compiler/vectorization/TestForEachRem.java#L69-L73
>> 
>> All of these examples are vectorized, despite the cyclic dependency of distance 2. The cyclic dependency is dropped, instead the emitted vector code implements a shift by 2, instead of repeating the same 2 values.
>> 
>> **Analysis**
>> 
>> The `create_pack` logic in `SuperWord::find_adjacent_refs` is broken in two ways:
>> 
>> - When the compile directive `Vectorize` is on, or we compile `vmIntrinsics::_forEachRemaining` we have `_do_vector_loop == true`. When that is the case, we blindly trust that there is no cyclic dependency larger than distance 1. Distance 1 would already be detected by the `independence(s1, s2)` checks we do for all adjacent memops. But for larger distances, we rely on `memory_alignment == 0`. But the compile directive avoids these checks.
>> - If `best_align_to_mem_ref` is of a different type, and we have `memory_alignment(mem_ref, best_align_to_mem_ref) == 0`, we do not check if `mem_ref` has `memory_alignment == 0` for all other refs of the same type. In the example `TestCyclicDependency::test2`, we have `best_align_to_mem_ref` as the `StoreF`. Then we assess the `StoreI`, which is not aligned with it, but it is of a different type, so we accept it too. Finally, we look at `LoadI`, which has perfect alignment with the `StoreF`, so we accept it too (even though it is in conflict with the `StoreI`).
>> 
>> Generally, the nested if-statements are confusing and buggy. I propose to fix and refactor the code.
>> 
>> I also propose to only allow the compile directive `Vectorize` only if `vectors_should_be_aligned() == false`. If all vector operations have to be `vector_width` aligned, then they also have to be mutually aligned, and we cannot have patterns like `v[i] = v[i] + v[i+1]` for which the compile directive was introduced in the first place https://github.com/openjdk/jdk/commit/c7d33de202203b6da544f2e0f9a13952381b32dd.
>> **Update**: I found a **Test.java** that lead to a crash (`SIGBUS`) on a ARM32 on master. The example bypassed the alignment requirement because of `_do_vector_loop`, and allowed unaligned vector loads to be generated, on a platform that requires alignment. Thanks @fg1417 for running that test for me!
>> 
>> **Solution**
>> 
>> First, I implemented `SuperWord::verify_packs` which catches cyclic dependencies just before scheduling. The idea is to reassess every pack, and check if all memops in it are mutually independent. Turns out that per vector pack, it suffices to do a single BFS over the nodes in the block (see `SuperWord::find_dependence`). With this verification in place we at least get an assert instead of wrong execution.
>> 
>> I then refactored and fixed the `create_pack` code, and put the logic all in `SuperWord::is_mem_ref_alignment_ok`. With the added comments, I hope the logic is more straight forward and readable. If `_do_vector_loop == true`, then I filter the vector packs again in `SuperWord::combine_packs`, since we are at that point not sure that the packs are actually independent, we only know that adjacient memops are independent.
>> 
>> Another change I have made:
>> Disallow `extend_packlist` from adding `MemNodes` back in. Because if we have rejected some memops, we do not want them to be added back in later.
>> 
>> **Testing**
>> 
>> I added a few more regression tests, and am running tier1-3, plus some stress testing.
>> 
>> However, I need help from someone who can test this on **ARM32** and **PPC**, basically machines that have `vectors_should_be_aligned() == true`. I would love to have additional testing on those machine, and some reviews.
>> **Update:** @fg1417 did testing on ARM32, @reinrich did testing on PPC.
>> 
>> **Discussion / Future Work**
>> 
>> I wonder if we should have `_do_vector_loop == true` by default, since it allows more vectorization. With the added filtering, we are sure that we do not schedule packs with cyclic dependencies. We would have to evaluate performance and other side-effects of course. What do you think? [JDK-8303113](https://bugs.openjdk.org/browse/JDK-8303113)
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits:
> 
>  - merge master: resolved conflict in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java
>  - Merge master after NULL -> nullptr conversion
>  - Fixed wording from last commit
>  - A little renaming and improved comments
>  - resolve merge conflict after Roland's fix
>  - TestDependencyOffsets.java: add vanilla run
>  - TestDependencyOffsets.java: parallelize it + various AVX settings
>  - TestOptionVectorizeIR.java: removed PopulateIndex IR rule - fails on x86 32bit - see Matcher::match_rule_supported
>  - Merge branch 'master' into JDK-8298935
>  - Reworked TestDependencyOffsets.java
>  - ... and 31 more: https://git.openjdk.org/jdk/compare/25e7ac22...ff0850e6

Ok, great.
Thanks everybody for the help!
@vnkozlov @jatin-bhateja thanks for the reviews!
@fg1417 @reinrich thanks for the testing and feedback!
@TobiHartmann thanks for the review suggestions!

I have the follwing follow-up RFE's:

[JDK-8303113](https://bugs.openjdk.org/browse/JDK-8303113) [SuperWord] investigate if enabling _do_vector_loop by default creates speedup
I want to see if we can vectorize more, using the `Vectorize` approach: given `-AlignVector`, do no alignment checks, create packs optimistically. Filter out packs that are not `independent` later. This will remedy lots of the "collateral damage" of this Bug-fix here.

[JDK-8260943](https://bugs.openjdk.org/browse/JDK-8260943) Revisit vectorization optimization added by 8076284
This is an old one. But we should either delete the dead code that is not hard-coded to be `false`, or fix it. This is related to `_do_vector_loop` (JDK-8076284 first introduced it).

[JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827) C2 SuperWord: allow more fine grained alignment for +AlignVector
We should fix the "collateral damage" for the `+AlignVector` case. We can do that by relaxing the strict alignment requirement a bit, to 4/8-byte. The `vector_width` alignment was required on SPARC. But even there, the vectors were not longer than 8 bytes (@vnkozlov ). Some collateral damage will happen, for example some conversions will not be vectorized after this fix here. That will for example affect `TestVectorizeTypeConversion.java` on some platforms with `+AlignVector`.
**But**: I will not do this, and probably nobody from Oracle, as all our machines have `-AlignVector`. If anybody is interested in fixing this, you are free to take over the bug!

[JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042) C2 SuperWord: schedule must remove packs with cyclic dependencies
The **Bug 4** mentioned above, where we have cyclic dependencies on the `packs`, even when all `packs` are `independent`.

Thanks again to all the involved!

-------------

PR: https://git.openjdk.org/jdk/pull/12350


More information about the hotspot-compiler-dev mailing list