RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v2]
Jatin Bhateja
jbhateja at openjdk.org
Wed Jul 6 13:18:05 UTC 2022
On Tue, 5 Jul 2022 18:29:42 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>>
>> 8289186: Review comments resolved.
>
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3923:
>
>> 3921: // End of low-level memory operations.
>> 3922:
>> 3923: @ForceInline
>
> Why this change? Was it missing before or you found that based on testing?
> What is criteria to add `@ForceInline`?
Thanks for highlighting this, checkMaskFromIndexSize is being used to test illegal memory access cases with out-of-range offsets i.e. tail scenarios, thus profile based invocation count will always be low for these calls. I saw improved performance on targeted micros, but I get your point that aggressive forced in-lining on non-frequently taken paths can have adverse performance side-effects. But then it may overshadow some of the performance gains due to masked load/strores support on tail paths, but we still see a modest gain in order of 2-3x vs original 10x gain for non-sub word types over baseline.
Benchmark (inSize) (outSize) Mode Cnt Score Error Units
LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 748.793 ops/ms
LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 381.655 ops/ms
LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 741.809 ops/ms
LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 757.433 ops/ms
LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 386.450 ops/ms
LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 471.260 ops/ms
-------------
PR: https://git.openjdk.org/jdk/pull/9324
More information about the hotspot-compiler-dev
mailing list