Observations on vectorbenchmarks on ARM

Wed Nov 18 21:26:55 UTC 2020

> On Nov 18, 2020, at 12:27 PM, Ludovic Henry <luhenry at microsoft.com> wrote:
> 
> Hi Paul,
> 
>> We have not done much work trying to optimize the fallback cases e.g. composing using smaller vector sizes, or letting the auto-vectorizer have at it (harder for the compiler to see given the use of lambdas). Priority right now is to focus on getting the code gen right when the architecture supports the vector shapes.
> 
> Yes, that makes perfect sense.
> 
>> Regarding bounds checks, I am wondering if the Objects.checkIndex method [1] is fully intrinsic on ARM. Can you try on x86 and compare?
> 
> Great question. I'll check on that tomorrow morning (it's evening for me right now).

Ok, thanks.

> 
>> (We also have some work in progress to improve bounds checks when the upper loop bound is calculated from VectorSpecies.loopBound.)
> 
> Where could I follow along this work? Or even where could I find some documentation / discussions on the topic?
> 

Not much documentation, here are some links to Roland’s recent and ongoing work:

8255150: Add utility methods to check long indexes and ranges #1003
https://github.com/openjdk/jdk/pull/1003 <https://github.com/openjdk/jdk/pull/1003>

Experimental work to elide bounds checks when using VectorSpecies.loopBound
https://github.com/rwestrel/jdk/tree/range_checks_paul

Use git-blame (dreadful name!) to look at the changes for intrinsification of Preconditions.checkIndex.

>> The oscillation might be due to alignment, perhaps if the vector loads/stores are misaligned the instruction cost is higher?
> 
> That is exactly right. After testing with `-XX:ObjectAlignmentInBytes=16`, the oscillation is gone and the performance is, as expected, stable at ~2300 ops/ms.
> 

When we support access to MemorySegments, it will be possible to allocate (native, or off-heap) segments with specific alignment characteristics.

Paul.