[vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store

Yang Zhang Yang.Zhang at arm.com
Mon Mar 9 06:44:57 UTC 2020


Thanks for your review.

> I presume that test also failed with your prior patch? Since the assert effectively inlines the check you had in the method:

No, these tests(Long/DoubleMaxVectorTests) succeed with my prior patch. Because method lowerHalfTrueMask() is called only with Max=no-power-of-two or 2048. So there isn't assert failure.
In such situation, vsp.vectorType() is DoubleMaxVector, and isp.vectorType() is also IntMaxVector. There isn't cast failure either.


-----Original Message-----
From: Paul Sandoz <paul.sandoz at oracle.com> 
Sent: Saturday, March 7, 2020 12:59 AM
To: Yang Zhang <Yang.Zhang at arm.com>
Cc: panama-dev at openjdk.java.net
Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store

+1

> On Mar 6, 2020, at 2:50 AM, Yang Zhang <Yang.Zhang at arm.com> wrote:
> 
>> When "isp.laneCount() != vsp.laneCount()” is true then I assume its always the case that the species vector type is DoubleMaxVector.
>> Is there anyway we could make this more explicit?
>> Can we refer directly to IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK as in:
>> assert vsp.vectorType() == DoubleMaxVector.class  .......
>> .fromArray(isp, indexMap, mapOffset, 
>> IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK)
> 
> When "isp.laneCount() != vsp.laneCount()” is true, it means that DoubleMaxVector is non-power-of-two or 2048 etc. 128, 256, 512 are excluded. Because when vsp.vectorShap is 128/256/512, isp. vectorShape can be initialized as 64/128/256.
> 
> "assert vsp.vectorType() == DoubleMaxVector.class" doesn't exclude 128/256/512. There are test failures in Double/LongMaxVectorTests.
> 
> test DoubleMaxVectorTests.gatherDoubleMaxVectorTests(double[-i * 5], 
> index[random]): failure
> java.lang.ClassCastException: class 
> jdk.incubator.vector.IntMaxVector$IntMaxMask cannot be cast to class 
> jdk.incubator.vector.Int128Vector$Int128Mask
> 

I presume that test also failed with your prior patch? Since the assert effectively inlines the check you had in the method: 

3745         // Generate lower half true mask for gather load/scatter store
3746         // for DoubleMaxVector.
3747         @ForceInline
3748         public final VectorMask<Integer> lowerHalfTrueMask() {
3749             if ((Class<?>) vectorType() == DoubleMaxVector.class) {
3750                 return IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK;
3751             }
3752             throw new AssertionError();
3753         }


> IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK can be used here, so method lowerHalfTrueMask can be removed.
> 
> Please check this update.
> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev
> .02/
> 
> Regards
> Yang
> 
> 
> -----Original Message-----
> From: Paul Sandoz <paul.sandoz at oracle.com>
> Sent: Friday, March 6, 2020 12:35 AM
> To: Yang Zhang <Yang.Zhang at arm.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit 
> vector length for gather load/scatter store
> 
> Thank you this looks better.
> 
> The fragility of the renamed method lowerHalfTrueMask still makes me uncomfortable.
> 
> 
> 3038         IntVector vix;
> 3039         if (isp.laneCount() != vsp.laneCount()) {
> 3040             // For DoubleMaxVector,  if vector length is 2048 bits, indexShape
> 3041             // of Double species is S_MAX_BIT. and the lane count of Double
> 3042             // vector is 32. When converting Double species to int species,
> 3043             // indexShape is still S_MAX_BIT, but the lane count of int vector
> 3044             // is 64. So when loading index vector (IntVector), only lower half
> 3045             // of index data is needed.
> 3046             vix = IntVector
> 3047                 .fromArray(isp, indexMap, mapOffset, vsp.lowerHalfTrueMask())
> 3048                 .add(offset);
> 3049         } else {
> 
> When "isp.laneCount() != vsp.laneCount()” is true then I assume its always the case that the species vector type is DoubleMaxVector.
> 
> Is there anyway we could make this more explicit?
> 
> Can we refer directly to IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK as in:
> 
>  assert vsp.vectorType() == DoubleMaxVector.class  ...
>  .fromArray(isp, indexMap, mapOffset, 
> IntMaxVector.IntMaxMask.LOWER_HALF_TRUE_MASK)
> 
> ?
> 
> Paul.
> 
> 
>> On Mar 4, 2020, at 11:07 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>> 
>> Hi Paul
>> 
>> Thanks for your review. I have updated the patch. Please check it.
>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webre
>> v
>> .01/
>> 
>> Regards
>> Yang
>> 
>> -----Original Message-----
>> From: Paul Sandoz <paul.sandoz at oracle.com>
>> Sent: Wednesday, March 4, 2020 10:14 AM
>> To: Yang Zhang <Yang.Zhang at arm.com>
>> Cc: panama-dev at openjdk.java.net
>> Subject: Re: [vector] RFR: Support non power-of-two and 2048-bit 
>> vector length for gather load/scatter store
>> 
>> Hi,
>> 
>> Than you for applying some consistency to these operations.
>> 
>> 
>> VectorShape
>>>> 
>> Can you add a @throws to the JavaDoc of forIndexBitSize and also forBitSize?
>> 
>> 
>> ByteVector & ShortVector
>>>> 
>> Also change the mask accepting gather implementation to use the internal mask accepting stOp.  That implementation currently fails if any of the mask bits are not set.
>> 
>> 
>> X-Vector.java.template
>>>> 
>> 4389 #if[longOrDouble]
>> 4390
>> 4391         // Mask for gather load
>> 4392         @ForceInline
>> 4393         public final VectorMask<Integer> gatherMask() {
>> 4394             if ((Class<?>) vectorType() == $Type$MaxVector.class) {
>> 4395                 return IntMaxVector.IntMaxMask.GATHER_MASK;
>> 4396             }
>> 4397             throw new AssertionError();
>> 4398         }
>> 4399 #end[longOrDouble]
>> 
>> Hmm.. I am unsure about this since it’s a general method (should be package private) but only works in the case of when the vector type is of the max vector.
>> 
>> Is there a better way to surface this up since presumably this is called only when the vector type is of  max vector? Maybe be more explicit in the case of "isp.laneCount() != vsp.laneCount()”?
>> 
>> 
>> X-VectorBits.java
>>>> 
>> 699 #if[intAndMax]
>> 700         static final IntMaxMask GATHER_MASK = new IntMaxMask(maskLowerHalf());
>> 701 #end[intAndMax]
>> 
>> LOWER_HALF_TRUE_MASK is a wordier but more accurate description.
>> 
>> Paul.
>> 
>>> On Feb 13, 2020, at 10:32 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm adding support non power-of-two and 2048-bit vector length for gather load/scatter store.
>>> Could you please help to review it?
>>> 
>>> Webrev:
>>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webr
>>> e
>>> v
>>> .00/index.html
>>> No new failures with a full jtreg.
>>> 
>>> In this patch, I made the following changes.
>>> 1. For gather load/scatter store, int array is used for index map. New index shape calculation function is added.
>>> For AArch64 SVE, the maximum of index bit size is  (2048/elementSize) * 32.
>>> Index increments is (128/elementSize) * 32. So that new index shape calculation function is added.
>>> 
>>> 2. Use a gather mask to control index vector loading for long/double gather load/scatter store.
>>> When vector length is 2048 or non-power-of-two, e.g. SVE, there are 
>>> index out of bounds failures in long/double gather load test cases.
>>> Take 2048 as an example, in long gather load (fromArray), indexShape 
>>> of long species is S_MAX_BIT, and the lane count of long vector is 32.
>>> When converting long species to int species, indexShape of int 
>>> species is still S_MAX_BIT, but the lane count of int vector is 64.
>>> So when loading index vector (IntVector), unnecessary index data is loaded.
>>> If current vector is the tail, out of bounds failure happens.
>>> 
>>> This failure is only for SVE. For X86, the reason why there isn't 
>>> such failure is that:
>>> i)  Byte/Short gather loads aren't intrinsified.
>>> ii) For X86 AVX512, indexShape(int index map, 8 elements) of
>>> long512/double512
>>> (8 elements) is initialized as S_256_BIT. For SVE with 512-bit 
>>> vector length, indexShape is initialized as S_256_BIT too. But for 
>>> SVE 2048-bit and non-power-of-two, there will be failures above.
>>> 
>>> 3. Gather load and scatter store is a pair of similar operations. One solution should be applied to them.
>>> The original java implementations of gather load and scatter store are different.
>>> 
>>> Vector                            gather load                             scatter store
>>> Int or float                    With intrinsification               With intrinsification
>>> Long or Double            With intrinsification               With intrinsification
>>>                                     Get indexShape directly        Get indexShape indirectly
>>>                                     Normal index loading             Special controlled index loading
>>> Byte or short                Without intrinsification         With intrinsification, no instruction support on x86/arm
>>> 
>>> I think gather load and scatter store is a pair of similar operations. One solution should be applied to them.
>>> Based on above, I use a simple implementation for them.
>>> Vector                            gather load/scatter store
>>> Int or float                    With intrinsification
>>> Long or Double            With intrinsification
>>>                                     Get indexShape directly
>>>                                     Special controlled index loading
>>> Byte or short                Without intrinsification
>>> If any problem, please let me know.
>>> 
>>> 4. Some assertions that vector length is power of two are removed.
>>> 5. Add comments for gather load intrinsification.
>>> 
>>> Regards,
>>> Yang
>>> 
>>> 
>> 
>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.



More information about the panama-dev mailing list