RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v2]
Vladimir Kozlov
kvn at openjdk.org
Thu Jul 7 19:01:47 UTC 2022
On Wed, 6 Jul 2022 13:13:08 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3923:
>>
>>> 3921: // End of low-level memory operations.
>>> 3922:
>>> 3923: @ForceInline
>>
>> Why this change? Was it missing before or you found that based on testing?
>> What is criteria to add `@ForceInline`?
>
> Thanks for highlighting this, checkMaskFromIndexSize is being used to test illegal memory access cases with out-of-range offsets i.e. tail scenarios, thus profile based invocation count will always be low for these calls. I saw improved performance on targeted micros, but I get your point that aggressive forced in-lining on non-frequently taken paths can have adverse performance side-effects. But then it may overshadow some of the performance gains due to masked load/strores support on tail paths, but we still see a modest gain in order of 2-3x vs original 10x gain for non-sub word types over baseline.
>
>
> Benchmark (inSize) (outSize) Mode Cnt Score Error Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 1026 1152 thrpt 2 748.793 ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 1026 1152 thrpt 2 381.655 ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 1026 1152 thrpt 2 741.809 ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 1026 1152 thrpt 2 757.433 ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 1026 1152 thrpt 2 386.450 ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 1026 1152 thrpt 2 471.260 ops/ms
okay.
-------------
PR: https://git.openjdk.org/jdk/pull/9324
More information about the hotspot-compiler-dev
mailing list