RFR: 8298935: fix cyclic dependency bug in create_pack logic in SuperWord::find_adjacent_refs [v15]

Emanuel Peter epeter at openjdk.org
Mon Mar 6 10:35:26 UTC 2023


On Mon, 6 Mar 2023 10:21:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> @jatin-bhateja
>>> 
>>> > Thanks, even though newly added test now passes at all AVX and SSE level can you kindly investigate why should following be vectorized with un-aligned accesses when it carries a cross iteration true dependency with distance 4.
>>> 
>>> The cyclic dependency is at a distance of 3, not 4 in this example. Ints are 4 bytes. Thus, the `byte_offset` is 12 bytes. So if `MaxVectorSize <= 12`, we cannot ever have a cyclic dependency within a vector.
>>> 
>>> See my explanations at the beginning of the test file, for example:
>>> 
>>> https://github.com/openjdk/jdk/blob/fb7f6dd9fbc2a6086d2ad36e0681fbc9eff6c9a7/test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java#L49-L57
>> 
>> My bad, just pasted a wrong example, wanted to refer 
>> 
>> https://github.com/openjdk/jdk/pull/12350#discussion_r1125924921
>> 
>> do you really see any value in generating tests for synthetic vector sizes where MaxVectorSize is a non-power of two. Lets remove them to reduce noise ?
>
>> With +AlignVector behavior with and without Vectorize,true pragma should match.
> 
> This was about example with `fArr[i + 4] = fArr[i];` in the loop. `byte_offset = 4 * 4 = 16`.
> 
> @jatin-bhateja I am not sure what you are trying to say, what do you mean by `should match`?
> 
> If you mean to say "should vectorize": I think it should **not** vectorize, and your output shows that there must be a bug (with master, before my fix):
> `LoadVector  === ... #vectory[8]:{float}`
> You have a cyclic dependency with float-distance 4 (`byte_distance = 16`). But you have 8 floats in the vector. That will lead to wrong results. It should only vectorize if `MaxVectorSize <= 16`. See conditions for `testIntP4` which I quoted above.
> 
> I made a full test with it, and pasted it below.
> I run it with these command lines:
> 
> `./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=Vectorize,Test::test,true -XX:+TraceNewVectors -XX:+AlignVector Test.java`
> 
> 1. On `master`, with `+AlignVector` and `Vectorize true`: Vectorizes, wrong results, asserts!
> 2. On `master`, with `-AlignVector` and `Vectorize true`: Vectorizes, wrong results, asserts!
> 3. On `master`, with `-AlignVector` and `Vectorize false`: Does not vectorize. Detects the cyclic dependency (`LoadF` and `StoreF` have `memory_alignment != 0`).
> 4. On `master`, with `+AlignVector` and `Vectorize false`: same as for 4.
> 
> As you can see, here the flag `AlignVector` is not even relevant.
> 
> Why do we get wrong results? We bypass the `memory_alignment == 0` check when we have `_do_vector_loop == true`. That bypasses the alignment analysis which is critical. Without it, we only ever check `independence` at distance 1 (for the pairs), and not for all elements in a vector! Relevant section on `master`:
> https://github.com/openjdk/jdk/blob/8f195ff236000d9c019f8beb2b13355083e211b5/src/hotspot/share/opto/superword.cpp#L646
> 
> With `my patch`: all of the command-lines from above will not vectorize. Except if you set `-XX:MaxVectorSize=16` or smaller, where the cyclic dependency cannot manifest within one vector.
> 
> @jatin-bhateja does this answer you question? Or did I misunderstand your question?
> 
> **PS**: I have found the "alignment analysis" and "independence" checks rather confusing. And obviously our code-base was changed without properly testing it, and I think also without properly understanding it. In Larsen's [paper](https://groups.csail.mit.edu/cag/slp/SLP-PLDI-2000.pdf), on which the `SuperWord` implementation is based, they only ever explicitly test `independent(s1, s2)` for the elements of a pair. But in their definitions they definie not just `pairs` to be `independent`, but also `packs`. But how do you get from `independence` of `pairs` to `independence` of `packs`? The best I could find was this sentence in the paper:
> 
> Since the adjacent memory identification phase uses alignment information,
> it will never create pairs of memory accesses that cross an alignment boundary.
> 
> It is not further described in the paper unfortunately. But the idea is that you have "alignment boundies", and that pairs are not supposed to cross them. I think that is exactly why we require all `mem_ref`'s of the same type (memory slice) to be aligned (`memory_alignment == 0`). That ensures that no pairs cross the alignment boundary of any other `pack` of the same type (memory slice).
> 
> But of course requiring this strict alignment is quite restrictive. So that is why the CompileCommand `Vectorize` was introduced. But it was never properly tested it seems. And it just trusts the programmer that there are no cyclic dependencies. That is why I now added the verification and filtering. It prevents vectorization when cyclic dependencies are detected by my new `SuperWord::find_dependence`.
> 
> 
> public class Test {
>     static int N = 100;
> 
>     public static void main(String[] strArr) {
>         float[] gold = new float[N];
>         float[] data = new float[N];
>         init(gold);
>         test(gold);
>         for (int i = 0; i < 10_000; i++){
>             init(data);
>             test(data);
>             verify(data, gold);
>         }
>         System.out.println("success.");
>     }
> 
>     static void test(float[] data) {
>          for (int i = 0; i < N - 4; i++) {
>              data[i + 4] = data[i];
>          }
>     }
> 
>     static void init(float[] data) {
>         for (int j = 0; j < N; j++) {
>             data[j] = j;
>         }
>     }
> 
>     static void verify(float[] data, float[] gold) {
>         for (int i = 0; i < N; i++) {
>             if (data[i] != gold[i]) {
>                 throw new RuntimeException(" Invalid result: dataF[" + i + "]: " + data[i] + " != " + gold[i]);
>             }
>         }
>     }
> }

> do you really see any value in generating tests for synthetic vector sizes where MaxVectorSize is a non-power of two. Lets remove them to reduce noise ?

I see a value in having non-power of 2 offsets, yes. They should vectorize if the vector width is small enough. And then there are some values like `18, 20, 192` that are a there to check vectorization with `+AlignVector`.

Maybe you find the `MaxVectorSize <= 12` "noisy" somehow, because it is equivalent to `MaxVectorSize <= 8`? I find it rather helpful, because `12` reflects the `byte_offset`, and so makes the rule a bit more understandable.

Finally, I generate many tests, I don't want to do that by hand. So maybe the rules are not simplified perfectly. I tried to improve it a bit. If you have a concrete idea how to further improve, I'm open for suggestions. I could for example round down the values to the next power of 2, or something like that. But again: would that really make the rules more understandable?

-------------

PR: https://git.openjdk.org/jdk/pull/12350


More information about the hotspot-compiler-dev mailing list