RFR: JDK-8155618 aarch32: implement CRC32 intrinsics
Andrey Petushkov
andrey.petushkov at gmail.com
Thu Apr 28 10:49:30 UTC 2016
Dear Ed, All,
Please consider the below patch to implement intrinsics for CRC32 functionality
All 3 possible implementations are provided: generic cpu, neon and crc32 ARM v8 instruction based.
The evaluated performance gain of the calculation itself is the following:
cortex a7 a8 a53
c 123.487 132.013 309.565
asm 131.755 7% 141.297 7% 307.401 -1%
neon 118.91 -4% 159.718 21% 479.442 55%
crc32 1361.446 343%
As you can see there is no benefit in using neon on Cortex A7 (and presumably A5). So neon is turned on by default only on Cortex A8 and above
The above numbers are for CRC calculation itself, so elimination of JNI overhead provides additional benefit. E.g. on Cortex A7 the generic asm implementation gives the below benefit:
buffer size + ops/s
Xint 128 20.56%
512 18.06%
1024 14.66%
1048576 5.52%
Xcomp Xbatch 128 77.85%
512 32.63%
1024 20.53%
1048576 6.59%
The patch is here:
http://cr.openjdk.java.net/~snazarki/8155618/ <http://cr.openjdk.java.net/~snazarki/8155618/>
Thanks in advance,
Andrey
More information about the aarch32-port-dev
mailing list