RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics
Pengfei Li (Arm Technology China)
Pengfei.Li at arm.com
Mon Jan 21 10:53:47 UTC 2019
Hi Reviewers,
Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8216259
This is a vectorization optimization of AArch64 intrinsic code of Adler-32 checksum. An Adler-32 checksum is obtained by calculating two 16-bit checksums s1 and s2, and then concatenating their bits into a 32-bit integer. Details of the algorithm could be found on Wikipedia at https://en.wikipedia.org/wiki/Adler-32 .
In previous Adler-32 intrinsic code written by Edward Nevill, we accumulate the lower and upper halves of the checksum value, s1 and s2, for every 16 bytes in the nmax_loop and by16_loop. In this patch, these accumulation operations are vectorized with NEON instructions in these 2 loops.
I tested the correctness of my patch by comparing the checksum results of 5000 byte arrays of 1MB size. Test code and script can be found at [1].
I also tested the performance with and without my patch by a JMH case [2]. The JMH result shows that the performance gets ~2.5x optimized by this.
[1] http://cr.openjdk.java.net/~pli/rfr/8216259/Adler32Test.java
[2] http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java
--
Thanks,
Pengfei
More information about the hotspot-compiler-dev
mailing list