[vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
Yang Zhang
Yang.Zhang at arm.com
Thu Mar 5 07:07:37 UTC 2020
Hi Paul
Thanks for your review. I have updated the patch. Please check it.
http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev.01/
Regards
Yang
-----Original Message-----
From: Paul Sandoz <paul.sandoz at oracle.com>
Sent: Wednesday, March 4, 2020 10:14 AM
To: Yang Zhang <Yang.Zhang at arm.com>
Cc: panama-dev at openjdk.java.net
Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
Hi,
Than you for applying some consistency to these operations.
VectorShape
—
Can you add a @throws to the JavaDoc of forIndexBitSize and also forBitSize?
ByteVector & ShortVector
—
Also change the mask accepting gather implementation to use the internal mask accepting stOp. That implementation currently fails if any of the mask bits are not set.
X-Vector.java.template
—
4389 #if[longOrDouble]
4390
4391 // Mask for gather load
4392 @ForceInline
4393 public final VectorMask<Integer> gatherMask() {
4394 if ((Class<?>) vectorType() == $Type$MaxVector.class) {
4395 return IntMaxVector.IntMaxMask.GATHER_MASK;
4396 }
4397 throw new AssertionError();
4398 }
4399 #end[longOrDouble]
Hmm.. I am unsure about this since it’s a general method (should be package private) but only works in the case of when the vector type is of the max vector.
Is there a better way to surface this up since presumably this is called only when the vector type is of max vector? Maybe be more explicit in the case of "isp.laneCount() != vsp.laneCount()”?
X-VectorBits.java
—
699 #if[intAndMax]
700 static final IntMaxMask GATHER_MASK = new IntMaxMask(maskLowerHalf());
701 #end[intAndMax]
LOWER_HALF_TRUE_MASK is a wordier but more accurate description.
Paul.
> On Feb 13, 2020, at 10:32 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>
> Hi,
>
> I'm adding support non power-of-two and 2048-bit vector length for gather load/scatter store.
> Could you please help to review it?
>
> Webrev:
> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev
> .00/index.html
> No new failures with a full jtreg.
>
> In this patch, I made the following changes.
> 1. For gather load/scatter store, int array is used for index map. New index shape calculation function is added.
> For AArch64 SVE, the maximum of index bit size is (2048/elementSize) * 32.
> Index increments is (128/elementSize) * 32. So that new index shape calculation function is added.
>
> 2. Use a gather mask to control index vector loading for long/double gather load/scatter store.
> When vector length is 2048 or non-power-of-two, e.g. SVE, there are
> index out of bounds failures in long/double gather load test cases.
> Take 2048 as an example, in long gather load (fromArray), indexShape
> of long species is S_MAX_BIT, and the lane count of long vector is 32.
> When converting long species to int species, indexShape of int species
> is still S_MAX_BIT, but the lane count of int vector is 64. So when
> loading index vector (IntVector), unnecessary index data is loaded.
> If current vector is the tail, out of bounds failure happens.
>
> This failure is only for SVE. For X86, the reason why there isn't such
> failure is that:
> i) Byte/Short gather loads aren't intrinsified.
> ii) For X86 AVX512, indexShape(int index map, 8 elements) of
> long512/double512
> (8 elements) is initialized as S_256_BIT. For SVE with 512-bit vector
> length, indexShape is initialized as S_256_BIT too. But for SVE
> 2048-bit and non-power-of-two, there will be failures above.
>
> 3. Gather load and scatter store is a pair of similar operations. One solution should be applied to them.
> The original java implementations of gather load and scatter store are different.
>
> Vector gather load scatter store
> Int or float With intrinsification With intrinsification
> Long or Double With intrinsification With intrinsification
> Get indexShape directly Get indexShape indirectly
> Normal index loading Special controlled index loading
> Byte or short Without intrinsification With intrinsification, no instruction support on x86/arm
>
> I think gather load and scatter store is a pair of similar operations. One solution should be applied to them.
> Based on above, I use a simple implementation for them.
> Vector gather load/scatter store
> Int or float With intrinsification
> Long or Double With intrinsification
> Get indexShape directly
> Special controlled index loading
> Byte or short Without intrinsification
> If any problem, please let me know.
>
> 4. Some assertions that vector length is power of two are removed.
> 5. Add comments for gather load intrinsification.
>
> Regards,
> Yang
>
>
More information about the panama-dev
mailing list