RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]
Paul Sandoz
psandoz at openjdk.java.net
Fri Apr 29 21:37:44 UTC 2022
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Currently the vector load with mask when the given index happens out of the array boundary is implemented with pure java scalar code to avoid the IOOBE (IndexOutOfBoundaryException). This is necessary for architectures that do not support the predicate feature. Because the masked load is implemented with a full vector load and a vector blend applied on it. And a full vector load will definitely cause the IOOBE which is not valid. However, for architectures that support the predicate feature like SVE/AVX-512/RVV, it can be vectorized with the predicated load instruction as long as the indexes of the masked lanes are within the bounds of the array. For these architectures, loading with unmasked lanes does not raise exception.
>>
>> This patch adds the vectorization support for the masked load with IOOBE part. Please see the original java implementation (FIXME: optimize):
>>
>>
>> @ForceInline
>> public static
>> ByteVector fromArray(VectorSpecies<Byte> species,
>> byte[] a, int offset,
>> VectorMask<Byte> m) {
>> ByteSpecies vsp = (ByteSpecies) species;
>> if (offset >= 0 && offset <= (a.length - species.length())) {
>> return vsp.dummyVector().fromArray0(a, offset, m);
>> }
>>
>> // FIXME: optimize
>> checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>> return vsp.vOp(m, i -> a[offset + i]);
>> }
>>
>> Since it can only be vectorized with the predicate load, the hotspot must check whether the current backend supports it and falls back to the java scalar version if not. This is different from the normal masked vector load that the compiler will generate a full vector load and a vector blend if the predicate load is not supported. So to let the compiler make the expected action, an additional flag (i.e. `usePred`) is added to the existing "loadMasked" intrinsic, with the value "true" for the IOOBE part while "false" for the normal load. And the compiler will fail to intrinsify if the flag is "true" and the predicate load is not supported by the backend, which means that normal java path will be executed.
>>
>> Also adds the same vectorization support for masked:
>> - fromByteArray/fromByteBuffer
>> - fromBooleanArray
>> - fromCharArray
>>
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` on the x86 AVX-512 system:
>>
>> Benchmark before After Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms
>>
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename the "usePred" to "offsetInRange"
IIUC when the hardware does not support predicated loads then any false `offsetIntRange` value causes the load intrinsic to fail resulting in the fallback, so it would not be materially any different to the current behavior, just more uniformly implemented.
Why can't the intrinsic support the passing a boolean directly? Is it something to do with constants? If that is not possible I recommend creating named constant values and pass those all the way through rather than converting a boolean to an integer value. Then there is no need for a branch checking `offsetInRange`.
Might be better to hold off until the JEP is integrated and then update, since this will conflict (`byte[]` and `ByteBuffer` load methods are removed and `MemorySegment` load methods are added). You could prepare for that now by branching off `vectorIntrinsics`.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8035
More information about the core-libs-dev
mailing list