[aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays

Mon Oct 30 18:03:54 UTC 2017

On 30.10.2017 20:30, Andrew Haley wrote:
> On 30/10/17 16:43, Dmitrij Pochepko wrote:
>> I've tried simd loads(even aligned ones to be sure that alignment is not
>> an issue). simd versions were attached into JDK-8187472 as
>>    - v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop
>> iteration)
>>    - v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
>>    - v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).
>>
>> I've measured it on ThunderX and found while best non-simd version
>> handles 1000000 bytes arrays in ~295 microseconds, simd versions had
>> numbers about ~355 microseconds.
> I'm rather reluctant to accept non-SIMD intrinsics because I expect
> SIMD performance to improve, and I expect SIMD to be the future.  The
> same is true of implementations which avoid the use of ldp.
>
I also expected NEON to be faster on very new designs. Since I have a 
SIMD version of this intrinsic that I can merge into stub under an if 
with new option (like UseSIMDForArrayEquals with default value set to 
false, almost the same as existing UseSIMDForMemoryOps, which is used in 
array copy intrinsic) if you want, but it is slower for the CPUs we have 
access to and likely not going to be the default. This way we'll have a 
fast version and a SIMD version.

I am hesitant if it is best to do this, or keep a single, simple, and 
fastest version for now for this intrinsic, and get back to it when SVE 
becomes widely available.

What do you think?

Note that other intrinsics that are in the works will use SIMD.

Thanks,
Dmitrij