RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v8]
Emanuel Peter
epeter at openjdk.org
Mon Jan 29 14:05:48 UTC 2024
On Mon, 29 Jan 2024 12:14:50 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework.
>>
>> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test.
>>
>> Testing:
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417)
>> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64.
>> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line.
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
>
> Change to avx2 CPU feature check
I looked into it, by writing this test:
public class Test {
static int RANGE = 10_000;
public static void main(String[] args) {
int[] a = new int[RANGE];
int[] b = new int[RANGE];
for (int i = 0; i < 10_000; i++) {
test1(a, b);
test2(a, b, i % 200 - 100);
}
}
static void test1(int[] a, int[] b) {
for (int i = 0; i < a.length; i++) {
a[i] = b[i] / 15;
}
}
static void test2(int[] a, int[] b, int s) {
for (int i = 0; i < a.length; i++) {
a[i] = b[i] / 7;
}
}
}
And running this command:
`./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors Test.java`
In the logs, I see it attempts to vectorize, crating packs like this:
...
Pack: 7
align: 0 678 RShiftI === _ 679 153 [[ 671 ]] !orig=561,154 !jvms: Test::test1 @ bci:15 (line 15)
align: 4 667 RShiftI === _ 668 153 [[ 660 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15)
align: 8 561 RShiftI === _ 562 153 [[ 554 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15)
align: 12 154 RShiftI === _ 251 153 [[ 155 ]] !jvms: Test::test1 @ bci:15 (line 15)
Pack: 8
align: 0 676 MulL === _ 677 144 [[ 675 ]] !orig=559,146 !jvms: Test::test1 @ bci:15 (line 15)
align: 8 665 MulL === _ 666 144 [[ 664 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15)
...
But then, I also see:
Unimplemented
559 MulL === _ 560 144 [[ 558 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15)
And in `src/hotspot/cpu/aarch64/aarch64_vector.ad`, I see this:
bool Matcher::match_rule_supported_auto_vectorization(int opcode, int vlen, BasicType bt) {
if (UseSVE == 0) {
// These operations are not profitable to be vectorized on NEON, because no direct
// NEON instructions support them. But the match rule support for them is profitable for
// Vector API intrinsics.
if ((opcode == Op_VectorCastD2X && bt == T_INT) ||
(opcode == Op_VectorCastL2X && bt == T_FLOAT) ||
(opcode == Op_CountLeadingZerosV && bt == T_LONG) ||
(opcode == Op_CountTrailingZerosV && bt == T_LONG) ||
// The vector implementation of Op_AddReductionVD/F is for the Vector API only.
// It is not suitable for auto-vectorization because it does not add the elements
// in the same order as sequential code, and FP addition is non-associative.
opcode == Op_AddReductionVD || opcode == Op_AddReductionVF ||
opcode == Op_MulReductionVD || opcode == Op_MulReductionVF ||
opcode == Op_MulVL) {
return false;
}
}
return match_rule_supported_vector(opcode, vlen, bt);
}
**Conclusion**
The int-division is implemented using a `MulVL`, and that is not implemented in `asimd`, and so the vectorization fails.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1914758844
More information about the hotspot-compiler-dev
mailing list