RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v2]
Vladimir Kozlov
kvn at openjdk.org
Tue Jul 5 18:50:51 UTC 2022
On Mon, 4 Jul 2022 16:39:29 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>>
>> [JDK-8283667](https://bugs.openjdk.org/browse/JDK-8283667) added the support to handle masked loads on non-predicated targets by blending the loaded contents with zero vector iff unmasked portion of load does not span beyond array bounds.
>>
>> X86 AVX2 offers direct predicated vector loads/store instruction for non-sub word type.
>>
>> This patch adds the efficient backend implementation for predicated memory operations over int/long/float/double vectors.
>>
>> Please find below the JMH micro stats with and without patch.
>>
>>
>>
>> System : Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz [28C 2S Cascadelake Server]
>>
>> Baseline:
>> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 712.218 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 156.912 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 255.814 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 267.688 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 140.957 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 474.009 ops/ms
>>
>>
>> With Opt:
>> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 742.781 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 1241.021 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 2333.311 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 3258.754 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 1757.192 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 472.590 ops/ms
>>
>>
>> Predicated memory operation over sub-word type will be handled in a subsequent patch.
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8289186: Review comments resolved.
src/hotspot/cpu/x86/x86.ad line 1767:
> 1765: break;
> 1766: case Op_StoreVectorMasked:
> 1767: if (!VM_Version::supports_avx512bw() && (is_subword_type(bt) || UseAVX < 1)) {
Why duplicate the same check?
And you did not answer my suggestion about modifying the check.
src/hotspot/share/opto/vectorIntrinsics.cpp line 313:
> 311: if (!is_supported && (sopc == Op_StoreVectorMasked || sopc == Op_LoadVectorMasked)) {
> 312: return true;
> 313: }
Still unclear for me. As I understand `the upfront checks`, you mention, are checks at lines 270 and 286. So we know that `StoreVectorMasked` and `LoadVectorMasked` are supported.
The second part of your comment is talking about `non-predicated targets`. Does it mean that the real check should be next?:
if (Matcher::has_predicated_vectors()) {
....
} else if (sopc == Op_StoreVectorMasked || sopc == Op_LoadVectorMasked) {
// your comment
return true;
}
```
Or I am still missing what are you trying to do here?
src/hotspot/share/opto/vectornode.hpp line 907:
> 905: StoreVectorMaskedNode(Node* c, Node* mem, Node* dst, Node* src, const TypePtr* at, Node* mask)
> 906: : StoreVectorNode(c, mem, dst, at, src) {
> 907: assert(mask->bottom_type()->isa_vectmask(), "sanity");
Why the assert was added before? And why you can remove it now?
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3923:
> 3921: // End of low-level memory operations.
> 3922:
> 3923: @ForceInline
Why this change? Was it missing before or you found that based on testing?
What is criteria to add `@ForceInline`?
-------------
PR: https://git.openjdk.org/jdk/pull/9324
More information about the hotspot-compiler-dev
mailing list