RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v2]

Jatin Bhateja jbhateja at openjdk.org
Wed Jul 6 13:18:05 UTC 2022


On Tue, 5 Jul 2022 18:29:42 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8289186: Review comments resolved.
>
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3923:
> 
>> 3921:     // End of low-level memory operations.
>> 3922: 
>> 3923:     @ForceInline
> 
> Why this change? Was it missing before or you found that based on testing?
> What is criteria to add `@ForceInline`?

Thanks for highlighting this, checkMaskFromIndexSize is being used to test illegal memory access cases with out-of-range offsets i.e. tail scenarios, thus profile based invocation count will always be low for these calls.  I saw improved performance on targeted micros, but I get your point that aggressive forced in-lining on non-frequently taken paths can have adverse performance side-effects. But then it may overshadow some of the performance gains due to masked load/strores support on tail paths, but we still see a modest gain in order of 2-3x vs original 10x gain for non-sub word types over baseline. 


Benchmark                                          (inSize)  (outSize)   Mode  Cnt    Score   Error   Units
LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE        1026       1152  thrpt    2  748.793          ops/ms
LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE      1026       1152  thrpt    2  381.655          ops/ms
LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE       1026       1152  thrpt    2  741.809          ops/ms
LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE         1026       1152  thrpt    2  757.433          ops/ms
LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE        1026       1152  thrpt    2  386.450          ops/ms
LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE       1026       1152  thrpt    2  471.260          ops/ms

-------------

PR: https://git.openjdk.org/jdk/pull/9324


More information about the hotspot-compiler-dev mailing list