RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets.
Vladimir Kozlov
kvn at openjdk.org
Thu Jun 30 02:07:44 UTC 2022
On Wed, 29 Jun 2022 09:07:48 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Hi All,
>
> [JDK-8283667](https://bugs.openjdk.org/browse/JDK-8283667) added the support to handle masked loads on non-predicated targets by blending the loaded contents with zero vector iff unmasked portion of load does not span beyond array bounds.
>
> X86 AVX2 offers direct predicated vector loads/store instruction for non-sub word type.
>
> This patch adds the efficient backend implementation for predicated memory operations over int/long/float/double vectors.
>
> Please find below the JMH micro stats with and without patch.
>
>
>
> System : Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz [28C 2S Cascadelake Server]
>
> Baseline:
> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 712.218 ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 156.912 ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 255.814 ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 267.688 ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 140.957 ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 474.009 ops/ms
>
>
> With Opt:
> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 742.781 ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 1241.021 ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 2333.311 ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 3258.754 ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 1757.192 ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 472.590 ops/ms
>
>
> Predicated memory operation over sub-word type will be handled in a subsequent patch.
>
> Kindly review and share your feedback.
>
> Best Regards,
> Jatin
src/hotspot/cpu/x86/x86.ad line 1762:
> 1760: break;
> 1761: case Op_LoadVectorMasked:
> 1762: if (!VM_Version::supports_avx512bw() && (is_subword_type(bt) || UseAVX < 1)) {
With `UseAVX=0` we clear `supports_avx512bw`. So the test should be
if (!VM_Version::supports_avx512bw() && is_subword_type(bt) || UseAVX < 1)
And may be naive question. Is VectorMaskGen is used for `mask` node creation? If so, why to have separate support checks for `LoadVectorMasked/StoreVectorMasked`?
src/hotspot/share/opto/vectorIntrinsics.cpp line 313:
> 311: return true;
> 312: }
> 313:
Why it is placed here without `is_supported` check? Comment does not explain it.
-------------
PR: https://git.openjdk.org/jdk/pull/9324
More information about the hotspot-compiler-dev
mailing list