RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v5]
Vladimir Kozlov
kvn at openjdk.org
Fri Jul 8 20:14:42 UTC 2022
On Fri, 8 Jul 2022 08:38:36 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>>
>> [JDK-8283667](https://bugs.openjdk.org/browse/JDK-8283667) added the support to handle masked loads on non-predicated targets by blending the loaded contents with zero vector iff unmasked portion of load does not span beyond array bounds.
>>
>> X86 AVX2 offers direct predicated vector loads/store instruction for non-sub word type.
>>
>> This patch adds the efficient backend implementation for predicated memory operations over int/long/float/double vectors.
>>
>> Please find below the JMH micro stats with and without patch.
>>
>>
>>
>> System : Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz [28C 2S Cascadelake Server]
>>
>> Baseline:
>> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 712.218 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 156.912 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 255.814 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 267.688 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 140.957 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 474.009 ops/ms
>>
>>
>> With Opt:
>> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 742.781 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 1241.021 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 2333.311 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 3258.754 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 1757.192 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 472.590 ops/ms
>>
>>
>> Predicated memory operation over sub-word type will be handled in a subsequent patch.
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8289186
> - 8289186: jcheck failure
> - 8289186: Review comments resolved.
> - 8289186: Review comments resolved.
> - 8289186: Support predicated vector load/store operations over X86 AVX2 targets.
I started testing.
-------------
PR: https://git.openjdk.org/jdk/pull/9324
More information about the hotspot-compiler-dev
mailing list