[aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics
Dmitry Chuyko
dmitry.chuyko at bell-sw.com
Mon Jan 21 14:11:12 UTC 2019
Adler32 may be chosen as HDFS checksum. Hadoop uses 512 byte blocks by
default.
I see some speedups on Cavium Thunder X (1st gen, TX2 data later) with
provided patch:
64 B. 8%
512 B. 10%
1 MB. 10%.
We considered following improvements without using vector instructions.
Just split loads and break some data dependencies like:
__ ldr(temp0, Address(__ post(buff, 8)));
__ ldr(temp1, Address(__ post(buff, 8)));
__ add(s1, s1, temp0, ext::uxtb);
__ ubfx(temp2, temp0, 8, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp2);
__ ubfx(temp3, temp0, 16, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp3);
__ ubfx(temp2, temp0, 24, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp2);
__ ubfx(temp3, temp0, 32, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp3);
__ ubfx(temp2, temp0, 40, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp2);
__ ubfx(temp3, temp0, 48, 8);
__ add(s2, s2, s1);
__ add(s1, s1, temp3);
It shows 23% improvement on TX1 for size=512 but relatively the same
performance as baseline on TX2.
-Dmitry
On 1/21/19 4:12 PM, Andrew Haley wrote:
> On 1/21/19 12:21 PM, Andrew Haley wrote:
>
>> Also, how much is the Adler32 checksum actually used? Is it
>> something we care about?
> ... the ZIP file format uses Adler32, but as far as I remember we're
> using zlib, an external library, for our zipfile handling (i.e. our
> jar files.) If we are using an external library then the performance
> of our intrinsicmight not matter at all, Please check.
>
More information about the aarch64-port-dev
mailing list