RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

Wed Apr 20 02:49:27 UTC 2022

On Mon, 11 Apr 2022 09:04:36 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> The optimization for masked store is recorded to: https://bugs.openjdk.java.net/browse/JDK-8284050
>
>> The blend should be with the intended-to-store vector, so that masked lanes contain the need-to-store elements and unmasked lanes contain the loaded elements, which would be stored back, which results in unchanged values.
> 
> It may not work if memory is beyond legal accessible address space of the process, a corner case could be a page boundary.  Thus re-composing the intermediated vector which partially contains actual updates but effectively perform full vector write to destination address may not work in all scenarios.

Thanks for the comment! So how about adding the check for the valid array range like the masked vector load?
Codes like:

public final
    void intoArray(byte[] a, int offset,
                   VectorMask<Byte> m) {
        if (m.allTrue()) {
            intoArray(a, offset);
        } else {
            ByteSpecies vsp = vspecies();
            if (offset >= 0 && offset <= (a.length - vsp.length())) {     // a full range check
                intoArray0(a, offset, m, /* usePred */ false);                   // can be vectorized by load+blend_store
            } else {
                checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
                intoArray0(a, offset, m, /* usePred */ true);                    // only be vectorized by the predicated store
            }
        }
    }

-------------

PR: https://git.openjdk.java.net/jdk/pull/8035