[vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
Yang Zhang
Yang.Zhang at arm.com
Mon Feb 24 02:40:29 UTC 2020
Ping it again. Could anyone help to review it?
Regards
Yang
-----Original Message-----
From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of Yang Zhang
Sent: Friday, February 14, 2020 2:32 PM
To: panama-dev at openjdk.java.net
Subject: [vector] RFR: Support non power-of-two and 2048-bit vector length for gather load/scatter store
Hi,
I'm adding support non power-of-two and 2048-bit vector length for gather load/scatter store.
Could you please help to review it?
Webrev: http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.2048npow/webrev.00/index.html
No new failures with a full jtreg.
In this patch, I made the following changes.
1. For gather load/scatter store, int array is used for index map. New index shape calculation function is added.
For AArch64 SVE, the maximum of index bit size is (2048/elementSize) * 32.
Index increments is (128/elementSize) * 32. So that new index shape calculation function is added.
2. Use a gather mask to control index vector loading for long/double gather load/scatter store.
When vector length is 2048 or non-power-of-two, e.g. SVE, there are index out of bounds failures in long/double gather load test cases.
Take 2048 as an example, in long gather load (fromArray), indexShape of long species is S_MAX_BIT, and the lane count of long vector is 32.
When converting long species to int species, indexShape of int species is still S_MAX_BIT, but the lane count of int vector is 64. So when loading index vector (IntVector), unnecessary index data is loaded.
If current vector is the tail, out of bounds failure happens.
This failure is only for SVE. For X86, the reason why there isn't such failure is that:
i) Byte/Short gather loads aren't intrinsified.
ii) For X86 AVX512, indexShape(int index map, 8 elements) of long512/double512
(8 elements) is initialized as S_256_BIT. For SVE with 512-bit vector length, indexShape is initialized as S_256_BIT too. But for SVE 2048-bit and non-power-of-two, there will be failures above.
3. Gather load and scatter store is a pair of similar operations. One solution should be applied to them.
The original java implementations of gather load and scatter store are different.
Vector gather load scatter store
Int or float With intrinsification With intrinsification
Long or Double With intrinsification With intrinsification
Get indexShape directly Get indexShape indirectly
Normal index loading Special controlled index loading
Byte or short Without intrinsification With intrinsification, no instruction support on x86/arm
I think gather load and scatter store is a pair of similar operations. One solution should be applied to them.
Based on above, I use a simple implementation for them.
Vector gather load/scatter store
Int or float With intrinsification
Long or Double With intrinsification
Get indexShape directly
Special controlled index loading
Byte or short Without intrinsification
If any problem, please let me know.
4. Some assertions that vector length is power of two are removed.
5. Add comments for gather load intrinsification.
Regards,
Yang
More information about the panama-dev
mailing list