RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE
Emanuel Peter
epeter at openjdk.org
Thu Aug 10 10:51:01 UTC 2023
On Tue, 25 Jul 2023 07:42:59 GMT, Pengfei Li <pli at openjdk.org> wrote:
> Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can.
>
> On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits.
>
> As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page.
>
>
> @@ -321,7 +321,8 @@ class Type:
> p.append(Platform("avx512", ["avx512", "true"], 64))
> else:
> assert False, "type not implemented" + self.name
> - p.append(Platform("asimd", ["asimd", "true"], 32))
> + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
> + p.append(Platform("sve", ["sve", "true"], 256))
> return p
>
> class Test:
> @@ -457,7 +458,7 @@ class Generator:
> lines.append(" * and various MaxVectorSize values, and +- AlignVector.")
> lines.append(" *")
> lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
> - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606")
> + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570")
> lines.append(" *")
> lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
> lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
> @@ -598,7 +599,8 @@ class Generator:
> # IR rules
> for p in test.t.platforms():
> elements = p.vector_width // test.t.size
> - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
> + max_pre = "max " if p.name == "sve" else ""
> + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
> ############### -Align...
@pfustc Thanks for the changes and explanations, looks good to me! :)
Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than `-196, 196`. We could consider adding `255, 256, 511, 512, 1024, 1536` (positive and negative). Of course the question is if that increases the runtime too much, what do you think?
-------------
Marked as reviewed by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/15010#pullrequestreview-1571589212
More information about the hotspot-compiler-dev
mailing list