[vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store

Paul Sandoz paul.sandoz at oracle.com
Mon Feb 24 17:33:09 UTC 2020


(I was on holiday last week.)

I’ll take a look this week.

Paul.

> On Feb 23, 2020, at 6:40 PM, Yang Zhang <Yang.Zhang at arm.com> wrote:
> 
> Ping it again. Could anyone help to review it?
> 
> Regards
> Yang
> 
> -----Original Message-----
> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of Yang Zhang
> Sent: Friday, February 14, 2020 2:32 PM
> To: panama-dev at openjdk.java.net
> Subject: [vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
> 
> Hi,
> 
> I'm adding support non power-of-two and 2048-bit vector length for gather load/scatter store. 
> Could you please help to review it?
> 
> Webrev: http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev.00/index.html
> No new failures with a full jtreg. 
> 
> In this patch, I made the following changes.
> 1. For gather load/scatter store, int array is used for index map. New index shape calculation function is added.
> For AArch64 SVE, the maximum of index bit size is  (2048/elementSize) * 32. 
> Index increments is (128/elementSize) * 32. So that new index shape calculation function is added.
> 
> 2. Use a gather mask to control index vector loading for long/double gather load/scatter store.
> When vector length is 2048 or non-power-of-two, e.g. SVE, there are  index out of bounds failures in long/double gather load test cases.
> Take 2048 as an example, in long gather load (fromArray), indexShape of long species is S_MAX_BIT, and the lane count of long vector is 32.
> When converting long species to int species, indexShape of int species is still S_MAX_BIT, but the lane count of int vector is 64. So when loading index vector (IntVector), unnecessary index data is loaded.
> If current vector is the tail, out of bounds failure happens.
> 
> This failure is only for SVE. For X86, the reason why there isn't such failure is that:
> i)  Byte/Short gather loads aren't intrinsified.
> ii) For X86 AVX512, indexShape(int index map, 8 elements) of long512/double512
> (8 elements) is initialized as S_256_BIT. For SVE with 512-bit vector length, indexShape is initialized as S_256_BIT too. But for SVE 2048-bit and non-power-of-two, there will be failures above.
> 
> 3. Gather load and scatter store is a pair of similar operations. One solution should be applied to them.
> The original java implementations of gather load and scatter store are different.
> 
> Vector                            gather load                             scatter store
> Int or float                    With intrinsification               With intrinsification
> Long or Double            With intrinsification               With intrinsification
>                                       Get indexShape directly        Get indexShape indirectly
>                                       Normal index loading             Special controlled index loading
> Byte or short                Without intrinsification         With intrinsification, no instruction support on x86/arm
> 
> I think gather load and scatter store is a pair of similar operations. One solution should be applied to them.
> Based on above, I use a simple implementation for them.
> Vector                            gather load/scatter store
> Int or float                    With intrinsification
> Long or Double            With intrinsification
>                                       Get indexShape directly
>                                       Special controlled index loading
> Byte or short                Without intrinsification
> If any problem, please let me know.
> 
> 4. Some assertions that vector length is power of two are removed.
> 5. Add comments for gather load intrinsification.
> 
> Regards,
> Yang
> 
> 



More information about the panama-dev mailing list