RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics

Mon Jan 21 10:53:47 UTC 2019

Hi Reviewers,

Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8216259

This is a vectorization optimization of AArch64 intrinsic code of Adler-32 checksum. An Adler-32 checksum is obtained by calculating two 16-bit checksums s1 and s2, and then concatenating their bits into a 32-bit integer. Details of the algorithm could be found on Wikipedia at https://en.wikipedia.org/wiki/Adler-32 .

In previous Adler-32 intrinsic code written by Edward Nevill, we accumulate the lower and upper halves of the checksum value, s1 and s2, for every 16 bytes in the nmax_loop and by16_loop. In this patch, these accumulation operations are vectorized with NEON instructions in these 2 loops.

I tested the correctness of my patch by comparing the checksum results of 5000 byte arrays of 1MB size. Test code and script can be found at [1].

I also tested the performance with and without my patch by a JMH case [2]. The JMH result shows that the performance gets ~2.5x optimized by this.

[1] http://cr.openjdk.java.net/~pli/rfr/8216259/Adler32Test.java
[2] http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java

--
Thanks,
Pengfei