Integrated: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature
Xiaohong Gong
xgong at openjdk.java.net
Tue Jun 7 07:45:23 UTC 2022
On Wed, 30 Mar 2022 10:31:59 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> Currently the vector load with mask when the given index happens out of the array boundary is implemented with pure java scalar code to avoid the IOOBE (IndexOutOfBoundaryException). This is necessary for architectures that do not support the predicate feature. Because the masked load is implemented with a full vector load and a vector blend applied on it. And a full vector load will definitely cause the IOOBE which is not valid. However, for architectures that support the predicate feature like SVE/AVX-512/RVV, it can be vectorized with the predicated load instruction as long as the indexes of the masked lanes are within the bounds of the array. For these architectures, loading with unmasked lanes does not raise exception.
>
> This patch adds the vectorization support for the masked load with IOOBE part. Please see the original java implementation (FIXME: optimize):
>
>
> @ForceInline
> public static
> ByteVector fromArray(VectorSpecies<Byte> species,
> byte[] a, int offset,
> VectorMask<Byte> m) {
> ByteSpecies vsp = (ByteSpecies) species;
> if (offset >= 0 && offset <= (a.length - species.length())) {
> return vsp.dummyVector().fromArray0(a, offset, m);
> }
>
> // FIXME: optimize
> checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
> return vsp.vOp(m, i -> a[offset + i]);
> }
>
> Since it can only be vectorized with the predicate load, the hotspot must check whether the current backend supports it and falls back to the java scalar version if not. This is different from the normal masked vector load that the compiler will generate a full vector load and a vector blend if the predicate load is not supported. So to let the compiler make the expected action, an additional flag (i.e. `usePred`) is added to the existing "loadMasked" intrinsic, with the value "true" for the IOOBE part while "false" for the normal load. And the compiler will fail to intrinsify if the flag is "true" and the predicate load is not supported by the backend, which means that normal java path will be executed.
>
> Also adds the same vectorization support for masked:
> - fromByteArray/fromByteBuffer
> - fromBooleanArray
> - fromCharArray
>
> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` on the x86 AVX-512 system:
>
> Benchmark before After Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 233.816 7075.923 ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms
>
> Similar performance gain can also be observed on 512-bit SVE system.
This pull request has now been integrated.
Changeset: 39fa52b5
Author: Xiaohong Gong <xgong at openjdk.org>
URL: https://git.openjdk.java.net/jdk/commit/39fa52b5f7504eca7399b863b0fb934bdce37f7e
Stats: 453 lines in 44 files changed: 174 ins; 21 del; 258 mod
8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature
Reviewed-by: sviswanathan, psandoz
-------------
PR: https://git.openjdk.java.net/jdk/pull/8035
More information about the core-libs-dev
mailing list