RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v3]

Jatin Bhateja jbhateja at openjdk.org
Wed Jul 6 13:17:57 UTC 2022


> Hi All,
> 
> [JDK-8283667](https://bugs.openjdk.org/browse/JDK-8283667) added the support to handle masked loads on non-predicated targets by blending the loaded contents with zero vector iff unmasked portion of load does not span beyond array bounds.
> 
> X86 AVX2 offers direct predicated vector loads/store instruction for non-sub word type.
> 
> This patch adds the efficient backend implementation for predicated memory operations over int/long/float/double vectors.
> 
> Please find below the JMH micro stats with and without patch.
> 
> 
> 
> System : Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz [28C 2S Cascadelake Server]
> 
> Baseline:
> Benchmark                                          (inSize)  (outSize)   Mode  Cnt    Score   Error   Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE        1026       1152  thrpt    2  712.218          ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE      1026       1152  thrpt    2  156.912          ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE       1026       1152  thrpt    2  255.814          ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE         1026       1152  thrpt    2  267.688          ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE        1026       1152  thrpt    2  140.957          ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE       1026       1152  thrpt    2  474.009          ops/ms
> 
> 
> With Opt:
> Benchmark                                          (inSize)  (outSize)   Mode  Cnt     Score   Error   Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE        1026       1152  thrpt    2   742.781          ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE      1026       1152  thrpt    2  1241.021          ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE       1026       1152  thrpt    2  2333.311          ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE         1026       1152  thrpt    2  3258.754          ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE        1026       1152  thrpt    2  1757.192          ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE       1026       1152  thrpt    2   472.590          ops/ms
> 
> 
> Predicated memory operation over sub-word type will be handled in a subsequent patch. 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  8289186: Review comments resolved.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/9324/files
  - new: https://git.openjdk.org/jdk/pull/9324/files/b3c193f4..60a777ca

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=9324&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9324&range=01-02

  Stats: 167 lines in 12 files changed: 134 ins; 26 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/9324.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9324/head:pull/9324

PR: https://git.openjdk.org/jdk/pull/9324


More information about the hotspot-compiler-dev mailing list