RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE [v2]

Mon Aug 28 08:49:27 UTC 2023

> Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can.
> 
> On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits.
> 
> As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page.
> 
> 
> @@ -321,7 +321,8 @@ class Type:
>             p.append(Platform("avx512", ["avx512", "true"], 64))
>          else:
>             assert False, "type not implemented" + self.name
> -        p.append(Platform("asimd", ["asimd", "true"], 32))
> +        p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
> +        p.append(Platform("sve", ["sve", "true"], 256))
>          return p
> 
>  class Test:
> @@ -457,7 +458,7 @@ class Generator:
>          lines.append(" *   and various MaxVectorSize values, and +- AlignVector.")
>          lines.append(" *")
>          lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
> -        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8308606")
> +        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8312570")
>          lines.append(" *")
>          lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
>          lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
> @@ -598,7 +599,8 @@ class Generator:
>              # IR rules
>              for p in test.t.platforms():
>                  elements = p.vector_width // test.t.size
> -                lines.append(f"    // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
> +                max_pre = "max " if p.name == "sve" else ""
> +                lines.append(f"    // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
>                  ###############  -Align...

Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:

 - Merge branch 'master' into deptest
 - 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE

   Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java`
   fails on AArch64 CPUs with 512-bit SVE. The reason is that many test
   loops in the code cannot be vectorized due to data dependence but IR
   tests assume they can.

   On AArch64, these IR tests just check the CPU feature of `asimd` and
   incorrectly assumes AArch64 vectors are at most 256 bits. But actually,
   `asimd` on AArch64 only represents NEON vectors which are at most 128
   bits. AArch64 CPUs may have another feature of `sve` which represents
   scalable vectors of at most 2048 bits. The vectorization won't succeed
   on 512-bit SVE CPUs if the memory offset between some read and write is
   less than 512 bits.

   As this jtreg is auto-generated by a python script, we have updated the
   script and re-generated this jtreg. In this new version, we checked the
   auto-vectorization on both NEON-only and NEON+SVE platforms. Below is
   the diff of the generator script. We have also attached the new script
   to the JBS page.

   ```
   @@ -321,7 +321,8 @@ class Type:
               p.append(Platform("avx512", ["avx512", "true"], 64))
            else:
               assert False, "type not implemented" + self.name
   -        p.append(Platform("asimd", ["asimd", "true"], 32))
   +        p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16))
   +        p.append(Platform("sve", ["sve", "true"], 256))
            return p

    class Test:
   @@ -457,7 +458,7 @@ class Generator:
            lines.append(" *   and various MaxVectorSize values, and +- AlignVector.")
            lines.append(" *")
            lines.append(" * Note: this test is auto-generated. Please modify / generate with script:")
   -        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8308606")
   +        lines.append(" *       https://bugs.openjdk.org/browse/JDK-8312570")
            lines.append(" *")
            lines.append(" * Types: " + ", ".join([t.name for t in self.types]))
            lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets]))
   @@ -598,7 +599,8 @@ class Generator:
                # IR rules
                for p in test.t.platforms():
                    elements = p.vector_width // test.t.size
   -                lines.append(f"    // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}")
   +                max_pre = "max " if p.name == "sve" else ""
   +                lines.append(f"    // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}")
                    ###############  -AlignVector
                    rule = PlatformIRRule(p)
                    rule.add_pre_constraint("AlignVector", IRBool.makeFalse())
   @@ -694,8 +696,8 @@ class Generator:
    def main():
        g = Generator()
        g.generate("TestDependencyOffsets",
   -               "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword",
   -               "8298935 8308606", # Big ID
   +               "test/hotspot/jtreg/compiler/loopopts/superword",
   +               "8298935 8308606 8312570", # Bug ID
                   "compiler.loopopts.superword", # package
        )
   ```

   We tested this on various of AArch64 CPUs.

-------------

Changes: https://git.openjdk.org/jdk/pull/15010/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15010&range=01
  Stats: 2062 lines in 1 file changed: 1422 ins; 0 del; 640 mod
  Patch: https://git.openjdk.org/jdk/pull/15010.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15010/head:pull/15010

PR: https://git.openjdk.org/jdk/pull/15010