RFR: 8289186: Support predicated vector load/store operations over X86 AVX2 targets. [v4]

Wed Jul 6 13:24:27 UTC 2022

On Wed, 6 Jul 2022 13:14:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> test/micro/org/openjdk/bench/jdk/incubator/vector/LoadMaskedIOOBEBenchmark.java line 98:
>> 
>>> 96:         for (int i = 0; i < inSize; i += bspecies.length()) {
>>> 97:             VectorMask<Byte> mask = VectorMask.fromArray(bspecies, m, i);
>>> 98:             ByteVector.fromArray(bspecies, byteIn, i, mask).intoArray(byteOut, i, mask);
>> 
>> Could you please add new benchmarks for masked `store` ?
>
> Done.

Here are results of new benchmark.

BaseLine:
Benchmark                                            (inSize)  (outSize)   Mode  Cnt    Score   Error   Units
StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE        1024       1022  thrpt    2  772.555          ops/ms
StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE      1024       1022  thrpt    2  180.548          ops/ms
StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE       1024       1022  thrpt    2  311.500          ops/ms
StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE         1024       1022  thrpt    2  312.457          ops/ms
StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE        1024       1022  thrpt    2  181.013          ops/ms
StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE       1024       1022  thrpt    2  538.537          ops/ms

WithOpt:

Benchmark                                            (inSize)  (outSize)   Mode  Cnt     Score   Error   Units
StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE        1024       1022  thrpt    2   757.079          ops/ms
StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE      1024       1022  thrpt    2  1553.923          ops/ms
StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE       1024       1022  thrpt    2  3060.020          ops/ms
StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE         1024       1022  thrpt    2  3025.225          ops/ms
StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE        1024       1022  thrpt    2  1562.263          ops/ms
StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE       1024       1022  thrpt    2   538.931          ops/ms

-------------

PR: https://git.openjdk.org/jdk/pull/9324