RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v8]

Emanuel Peter epeter at openjdk.org
Mon Jan 29 14:05:48 UTC 2024


On Mon, 29 Jan 2024 12:14:50 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

>> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework.
>> 
>> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test.
>> 
>> Testing:
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417)
>> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64.
>> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line.
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change to avx2 CPU feature check

I looked into it, by writing this test:


public class Test {
    static int RANGE = 10_000;

    public static void main(String[] args) {
        int[] a = new int[RANGE];
        int[] b = new int[RANGE];
        for (int i = 0; i < 10_000; i++) {
            test1(a, b);
            test2(a, b, i % 200 - 100);
        }
    }

    static void test1(int[] a, int[] b) {
        for (int i = 0; i < a.length; i++) {
            a[i] = b[i] / 15;
        }
    }

    static void test2(int[] a, int[] b, int s) {
        for (int i = 0; i < a.length; i++) {
            a[i] = b[i] / 7;
        }
    }
}


And running this command:
`./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors Test.java`

In the logs, I see it attempts to vectorize, crating packs like this:

...
Pack: 7
 align: 0 	 678  RShiftI  === _ 679 153  [[ 671 ]]  !orig=561,154 !jvms: Test::test1 @ bci:15 (line 15)
 align: 4 	 667  RShiftI  === _ 668 153  [[ 660 ]]  !orig=154 !jvms: Test::test1 @ bci:15 (line 15)
 align: 8 	 561  RShiftI  === _ 562 153  [[ 554 ]]  !orig=154 !jvms: Test::test1 @ bci:15 (line 15)
 align: 12 	 154  RShiftI  === _ 251 153  [[ 155 ]]  !jvms: Test::test1 @ bci:15 (line 15)
Pack: 8
 align: 0 	 676  MulL  === _ 677 144  [[ 675 ]]  !orig=559,146 !jvms: Test::test1 @ bci:15 (line 15)
 align: 8 	 665  MulL  === _ 666 144  [[ 664 ]]  !orig=146 !jvms: Test::test1 @ bci:15 (line 15)
 ...


But then, I also see:

Unimplemented
 559  MulL  === _ 560 144  [[ 558 ]]  !orig=146 !jvms: Test::test1 @ bci:15 (line 15)


And in `src/hotspot/cpu/aarch64/aarch64_vector.ad`, I see this:

  bool Matcher::match_rule_supported_auto_vectorization(int opcode, int vlen, BasicType bt) {
    if (UseSVE == 0) {
      // These operations are not profitable to be vectorized on NEON, because no direct
      // NEON instructions support them. But the match rule support for them is profitable for
      // Vector API intrinsics.
      if ((opcode == Op_VectorCastD2X && bt == T_INT) ||
          (opcode == Op_VectorCastL2X && bt == T_FLOAT) ||
          (opcode == Op_CountLeadingZerosV && bt == T_LONG) ||
          (opcode == Op_CountTrailingZerosV && bt == T_LONG) ||
          // The vector implementation of Op_AddReductionVD/F is for the Vector API only.
          // It is not suitable for auto-vectorization because it does not add the elements
          // in the same order as sequential code, and FP addition is non-associative.
          opcode == Op_AddReductionVD || opcode == Op_AddReductionVF ||
          opcode == Op_MulReductionVD || opcode == Op_MulReductionVF ||
          opcode == Op_MulVL) {
        return false;
      }
    }
    return match_rule_supported_vector(opcode, vlen, bt);
  }


**Conclusion**
The int-division is implemented using a `MulVL`, and that is not implemented in `asimd`, and so the vectorization fails.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1914758844


More information about the hotspot-compiler-dev mailing list