[vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
Paul Sandoz
paul.sandoz at oracle.com
Fri Mar 6 16:58:54 UTC 2020
+1
> On Mar 6, 2020, at 2:50 AM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>
>> When "isp.laneCount() != vsp.laneCount()” is true then I assume its always the case that the species vector type is DoubleMaxVector.
>> Is there anyway we could make this more explicit?
>> Can we refer directly to IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK as in:
>> assert vsp.vectorType() == DoubleMaxVector.class
>> .......
>> .fromArray(isp, indexMap, mapOffset, IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK)
>
> When "isp.laneCount() != vsp.laneCount()” is true, it means that DoubleMaxVector is non-power-of-two or 2048 etc. 128, 256, 512 are excluded. Because when vsp.vectorShap is 128/256/512, isp. vectorShape can be initialized as 64/128/256.
>
> "assert vsp.vectorType() == DoubleMaxVector.class" doesn't exclude 128/256/512. There are test failures in Double/LongMaxVectorTests.
>
> test DoubleMaxVectorTests.gatherDoubleMaxVectorTests(double[-i * 5], index[random]): failure
> java.lang.ClassCastException: class jdk.incubator.vector.IntMaxVector$IntMaxMask cannot be cast to class jdk.incubator.vector.Int128Vector$Int128Mask
>
I presume that test also failed with your prior patch? Since the assert effectively inlines the check you had in the method:
3745 // Generate lower half true mask for gather load/scatter store
3746 // for DoubleMaxVector.
3747 @ForceInline
3748 public final VectorMask<Integer> lowerHalfTrueMask() {
3749 if ((Class<?>) vectorType() == DoubleMaxVector.class) {
3750 return IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK;
3751 }
3752 throw new AssertionError();
3753 }
> IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK can be used here, so method lowerHalfTrueMask can be removed.
>
> Please check this update.
> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev.02/
>
> Regards
> Yang
>
>
> -----Original Message-----
> From: Paul Sandoz <paul.sandoz at oracle.com>
> Sent: Friday, March 6, 2020 12:35 AM
> To: Yang Zhang <Yang.Zhang at arm.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
>
> Thank you this looks better.
>
> The fragility of the renamed method lowerHalfTrueMask still makes me uncomfortable.
>
>
> 3038 IntVector vix;
> 3039 if (isp.laneCount() != vsp.laneCount()) {
> 3040 // For DoubleMaxVector, if vector length is 2048 bits, indexShape
> 3041 // of Double species is S_MAX_BIT. and the lane count of Double
> 3042 // vector is 32. When converting Double species to int species,
> 3043 // indexShape is still S_MAX_BIT, but the lane count of int vector
> 3044 // is 64. So when loading index vector (IntVector), only lower half
> 3045 // of index data is needed.
> 3046 vix = IntVector
> 3047 .fromArray(isp, indexMap, mapOffset, vsp.lowerHalfTrueMask())
> 3048 .add(offset);
> 3049 } else {
>
> When "isp.laneCount() != vsp.laneCount()” is true then I assume its always the case that the species vector type is DoubleMaxVector.
>
> Is there anyway we could make this more explicit?
>
> Can we refer directly to IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK as in:
>
> assert vsp.vectorType() == DoubleMaxVector.class
> ...
> .fromArray(isp, indexMap, mapOffset, IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK)
>
> ?
>
> Paul.
>
>
>> On Mar 4, 2020, at 11:07 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>>
>> Hi Paul
>>
>> Thanks for your review. I have updated the patch. Please check it.
>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev
>> .01/
>>
>> Regards
>> Yang
>>
>> -----Original Message-----
>> From: Paul Sandoz <paul.sandoz at oracle.com>
>> Sent: Wednesday, March 4, 2020 10:14 AM
>> To: Yang Zhang <Yang.Zhang at arm.com>
>> Cc: panama-dev at openjdk.java.net
>> Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit
>> vector length for gather load/scatter store
>>
>> Hi,
>>
>> Than you for applying some consistency to these operations.
>>
>>
>> VectorShape
>> —
>>
>> Can you add a @throws to the JavaDoc of forIndexBitSize and also forBitSize?
>>
>>
>> ByteVector & ShortVector
>> —
>>
>> Also change the mask accepting gather implementation to use the internal mask accepting stOp. That implementation currently fails if any of the mask bits are not set.
>>
>>
>> X-Vector.java.template
>> —
>>
>> 4389 #if[longOrDouble]
>> 4390
>> 4391 // Mask for gather load
>> 4392 @ForceInline
>> 4393 public final VectorMask<Integer> gatherMask() {
>> 4394 if ((Class<?>) vectorType() == $Type$MaxVector.class) {
>> 4395 return IntMaxVector.IntMaxMask.GATHER_MASK;
>> 4396 }
>> 4397 throw new AssertionError();
>> 4398 }
>> 4399 #end[longOrDouble]
>>
>> Hmm.. I am unsure about this since it’s a general method (should be package private) but only works in the case of when the vector type is of the max vector.
>>
>> Is there a better way to surface this up since presumably this is called only when the vector type is of max vector? Maybe be more explicit in the case of "isp.laneCount() != vsp.laneCount()”?
>>
>>
>> X-VectorBits.java
>> —
>>
>> 699 #if[intAndMax]
>> 700 static final IntMaxMask GATHER_MASK = new IntMaxMask(maskLowerHalf());
>> 701 #end[intAndMax]
>>
>> LOWER_HALF_TRUE_MASK is a wordier but more accurate description.
>>
>> Paul.
>>
>>> On Feb 13, 2020, at 10:32 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm adding support non power-of-two and 2048-bit vector length for gather load/scatter store.
>>> Could you please help to review it?
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webre
>>> v
>>> .00/index.html
>>> No new failures with a full jtreg.
>>>
>>> In this patch, I made the following changes.
>>> 1. For gather load/scatter store, int array is used for index map. New index shape calculation function is added.
>>> For AArch64 SVE, the maximum of index bit size is (2048/elementSize) * 32.
>>> Index increments is (128/elementSize) * 32. So that new index shape calculation function is added.
>>>
>>> 2. Use a gather mask to control index vector loading for long/double gather load/scatter store.
>>> When vector length is 2048 or non-power-of-two, e.g. SVE, there are
>>> index out of bounds failures in long/double gather load test cases.
>>> Take 2048 as an example, in long gather load (fromArray), indexShape
>>> of long species is S_MAX_BIT, and the lane count of long vector is 32.
>>> When converting long species to int species, indexShape of int
>>> species is still S_MAX_BIT, but the lane count of int vector is 64.
>>> So when loading index vector (IntVector), unnecessary index data is loaded.
>>> If current vector is the tail, out of bounds failure happens.
>>>
>>> This failure is only for SVE. For X86, the reason why there isn't
>>> such failure is that:
>>> i) Byte/Short gather loads aren't intrinsified.
>>> ii) For X86 AVX512, indexShape(int index map, 8 elements) of
>>> long512/double512
>>> (8 elements) is initialized as S_256_BIT. For SVE with 512-bit vector
>>> length, indexShape is initialized as S_256_BIT too. But for SVE
>>> 2048-bit and non-power-of-two, there will be failures above.
>>>
>>> 3. Gather load and scatter store is a pair of similar operations. One solution should be applied to them.
>>> The original java implementations of gather load and scatter store are different.
>>>
>>> Vector gather load scatter store
>>> Int or float With intrinsification With intrinsification
>>> Long or Double With intrinsification With intrinsification
>>> Get indexShape directly Get indexShape indirectly
>>> Normal index loading Special controlled index loading
>>> Byte or short Without intrinsification With intrinsification, no instruction support on x86/arm
>>>
>>> I think gather load and scatter store is a pair of similar operations. One solution should be applied to them.
>>> Based on above, I use a simple implementation for them.
>>> Vector gather load/scatter store
>>> Int or float With intrinsification
>>> Long or Double With intrinsification
>>> Get indexShape directly
>>> Special controlled index loading
>>> Byte or short Without intrinsification
>>> If any problem, please let me know.
>>>
>>> 4. Some assertions that vector length is power of two are removed.
>>> 5. Add comments for gather load intrinsification.
>>>
>>> Regards,
>>> Yang
>>>
>>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
More information about the panama-dev
mailing list