[aarch64-port-dev ] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays

Andrew Haley aph at redhat.com
Thu Feb 8 10:11:00 UTC 2018


On 07/02/18 19:39, Dmitrij Pochepko wrote:
> In general, this patch changes very short arrays handling(performing 
> 8-byte read instead of few smaller reads, using the fact of 8-byte 
> alignment) and jumping into stub with large 64-byte read loop for larger 
> arrays).
> 
> Measurements(measured array length 7,64,128,256,512,1024,100000. 
> Improvement in %. 80% improvement means that new version is 80% faster, 
> i.e. 5 times.):
> 
> 
> ThunderX: 2%, -4%, 0%, 2%, 32%, 55%, 80%
> 
> ThunderX2: 0%, -3%, 17%, 19%, 29%, 31%, 47%
> 
> Cortex A53 at 533MHz: 8%, -1%, -2%, 4%, 6%, 5%, 3%
> 
> Cortex A73 at 903MHz: 8%, -3%, 0%, 7%, 8%, 9%, 8%
> 
> Note: medium sizes are a bit slower because of additional branch 
> added(which checks size and jumps to stub).

This indentation is messed up:

@@ -5201,40 +5217,23 @@
   // length == 4.
   if (log_elem_size > 0)
     lsl(cnt1, cnt1, log_elem_size);
-  ldr(tmp1, Address(a1, cnt1));
-  ldr(tmp2, Address(a2, cnt1));
+    ldr(tmp1, Address(a1, cnt1));
+    ldr(tmp2, Address(a2, cnt1));

I'm not convinced that this works correctly if passed the address of a pair
of arrays at the end of a page.  Maybe it isn't used on sub-arrays today
in HotSpot, but one day it might be.

It pessimizes a very common case of strings, those of about 32 characters.
Please think again.  Please also think about strings that are long enough
for the SIMD loop but differ in their early substrings.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the aarch64-port-dev mailing list