RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE [v2]
Pengfei Li
pli at openjdk.org
Mon Aug 28 08:49:27 UTC 2023
> Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can.
>
> On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits.
>
> As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page.
>
>
> @@ -321,7 +321,8 @@ class Type:
> p.append(Platform("avx512", ["avx512", "true"], 64))
> else:
> assert False, "type not implemented" + self.name
> - p.append(Platform("asimd", ["asimd", "true"], 32))
> + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
> + p.append(Platform("sve", ["sve", "true"], 256))
> return p
>
> class Test:
> @@ -457,7 +458,7 @@ class Generator:
> lines.append(" * and various MaxVectorSize values, and +- AlignVector.")
> lines.append(" *")
> lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
> - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606")
> + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570")
> lines.append(" *")
> lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
> lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
> @@ -598,7 +599,8 @@ class Generator:
> # IR rules
> for p in test.t.platforms():
> elements = p.vector_width // test.t.size
> - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
> + max_pre = "max " if p.name == "sve" else ""
> + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
> ############### -Align...
Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
- Merge branch 'master' into deptest
- 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE
Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java`
fails on AArch64 CPUs with 512-bit SVE. The reason is that many test
loops in the code cannot be vectorized due to data dependence but IR
tests assume they can.
On AArch64, these IR tests just check the CPU feature of `asimd` and
incorrectly assumes AArch64 vectors are at most 256 bits. But actually,
`asimd` on AArch64 only represents NEON vectors which are at most 128
bits. AArch64 CPUs may have another feature of `sve` which represents
scalable vectors of at most 2048 bits. The vectorization won't succeed
on 512-bit SVE CPUs if the memory offset between some read and write is
less than 512 bits.
As this jtreg is auto-generated by a python script, we have updated the
script and re-generated this jtreg. In this new version, we checked the
auto-vectorization on both NEON-only and NEON+SVE platforms. Below is
the diff of the generator script. We have also attached the new script
to the JBS page.
```
@@ -321,7 +321,8 @@ class Type:
p.append(Platform("avx512", ["avx512", "true"], 64))
else:
assert False, "type not implemented" + self.name
- p.append(Platform("asimd", ["asimd", "true"], 32))
+ p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
+ p.append(Platform("sve", ["sve", "true"], 256))
return p
class Test:
@@ -457,7 +458,7 @@ class Generator:
lines.append(" * and various MaxVectorSize values, and +- AlignVector.")
lines.append(" *")
lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
- lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606")
+ lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570")
lines.append(" *")
lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
@@ -598,7 +599,8 @@ class Generator:
# IR rules
for p in test.t.platforms():
elements = p.vector_width // test.t.size
- lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
+ max_pre = "max " if p.name == "sve" else ""
+ lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
############### -AlignVector
rule = PlatformIRRule(p)
rule.add_pre_constraint("AlignVector", IRBool.makeFalse())
@@ -694,8 +696,8 @@ class Generator:
def main():
g = Generator()
g.generate("TestDependencyOffsets",
- "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword",
- "8298935 8308606", # Big ID
+ "test/hotspot/jtreg/compiler/loopopts/superword",
+ "8298935 8308606 8312570", # Bug ID
"compiler.loopopts.superword", # package
)
```
We tested this on various of AArch64 CPUs.
-------------
Changes: https://git.openjdk.org/jdk/pull/15010/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15010&range=01
Stats: 2062 lines in 1 file changed: 1422 ins; 0 del; 640 mod
Patch: https://git.openjdk.org/jdk/pull/15010.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/15010/head:pull/15010
PR: https://git.openjdk.org/jdk/pull/15010
More information about the hotspot-compiler-dev
mailing list