[aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics

Dmitry Chuyko dmitry.chuyko at bell-sw.com
Mon Jan 21 14:11:12 UTC 2019


Adler32 may be chosen as HDFS checksum. Hadoop uses 512 byte blocks by 
default.

I see some speedups on Cavium Thunder X (1st gen, TX2 data later) with 
provided patch:

64 B. 8%
512 B. 10%
1 MB. 10%.


We considered following improvements without using vector instructions. 
Just split loads and break some data dependencies like:

     __ ldr(temp0, Address(__ post(buff, 8)));
     __ ldr(temp1, Address(__ post(buff, 8)));

     __ add(s1, s1, temp0, ext::uxtb);
     __ ubfx(temp2, temp0, 8, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp2);
     __ ubfx(temp3, temp0, 16, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp3);
     __ ubfx(temp2, temp0, 24, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp2);
     __ ubfx(temp3, temp0, 32, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp3);
     __ ubfx(temp2, temp0, 40, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp2);
     __ ubfx(temp3, temp0, 48, 8);
     __ add(s2, s2, s1);
     __ add(s1, s1, temp3);

It shows 23% improvement on TX1 for size=512 but relatively the same 
performance as baseline on TX2.

-Dmitry

On 1/21/19 4:12 PM, Andrew Haley wrote:
> On 1/21/19 12:21 PM, Andrew Haley wrote:
>
>> Also, how much is the Adler32 checksum actually used? Is it
>> something we care about?
> ... the ZIP file format uses Adler32, but as far as I remember we're
> using zlib, an external library, for our zipfile handling (i.e. our
> jar files.) If we are using an external library then the performance
> of our intrinsicmight not matter at all, Please check.
>


More information about the aarch64-port-dev mailing list