RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Thu Apr 28 00:50:40 UTC 2022


On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Currently the vector load with mask when the given index happens out of the array boundary is implemented with pure java scalar code to avoid the IOOBE (IndexOutOfBoundaryException). This is necessary for architectures that do not support the predicate feature. Because the masked load is implemented with a full vector load and a vector blend applied on it. And a full vector load will definitely cause the IOOBE which is not valid. However, for architectures that support the predicate feature like SVE/AVX-512/RVV, it can be vectorized with the predicated load instruction as long as the indexes of the masked lanes are within the bounds of the array. For these architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies<Byte> species,
>>                        byte[] a, int offset,
>>                        VectorMask<Byte> m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>       if (offset >= 0 && offset <= (a.length - species.length())) {
>>           return vsp.dummyVector().fromArray0(a, offset, m);
>>       }
>> 
>>       // FIXME: optimize
>>       checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>       return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must check whether the current backend supports it and falls back to the java scalar version if not. This is different from the normal masked vector load that the compiler will generate a full vector load and a vector blend if the predicate load is not supported. So to let the compiler make the expected action, an additional flag (i.e. `usePred`) is added to the existing "loadMasked" intrinsic, with the value "true" for the IOOBE part while "false" for the normal load. And the compiler will fail to intrinsify if the flag is "true" and the predicate load is not supported by the backend, which means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` on the x86 AVX-512 system:
>> 
>> Benchmark                                          before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE    233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename the "usePred" to "offsetInRange"

Rest of the patch looks good to me.

src/hotspot/share/opto/vectorIntrinsics.cpp line 1232:

> 1230:   // out when current case uses the predicate feature.
> 1231:   if (!supports_predicate) {
> 1232:     bool use_predicate = false;

If we rename this to needs_predicate it will be easier to understand.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8035


More information about the core-libs-dev mailing list