RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

Quan Anh Mai duke at openjdk.java.net
Fri May 13 02:37:45 UTC 2022


On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory boundary is necessary for masked vector memory access. However, this could be saved if the given offset is inside of the vector range that could make sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have saved this kind of check for common cases. And this patch did the similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark                                   Before    After     Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask     4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask    1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Wrap the offset check into a static method

However, we seem to lack the ability to do an unsigned comparison reliably. C2 can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will fail if `x` or `y` is an addition with constant in such cases the constants will be merged together. As a result, I think we need an intrinsic for this. `Integer.compareUnsigned` may fit but it manifests the result into an integer register which may lead to suboptimal materialisation of flags, another approach would be to have a separate method `Integer.lessThanUnsigned` which only returns `boolean` and C2 can have better time splitting the boolean comparison through `IfNode`, which will prevent the materialisation of `boolean` values. What do you two think?

I.e, after splitting if through merge point, the shape of `if (Integer.lessThanUnsigned(a, b))` would be transformed from

         a        b
          \      /
            CmpU
             |
            Bool
             |
            If
          /     \
    IfTrue        IfFalse
          \     /
            Region        1        0
                \         |       /
                         Phi         0
                          \         /
                              CmpI

into

         a        b
          \      /
            CmpU

Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8620


More information about the core-libs-dev mailing list