RFR: JDK-8155618 aarch32: implement CRC32 intrinsics

Andrey Petushkov andrey.petushkov at gmail.com
Thu Apr 28 10:49:30 UTC 2016


Dear Ed, All,

Please consider the below patch to implement intrinsics for CRC32 functionality
All 3 possible implementations are provided: generic cpu, neon and crc32 ARM v8 instruction based.
The evaluated performance gain of the calculation itself is the following:
cortex		a7		a8		a53
c	123.487		132.013		309.565	
asm	131.755	7%	141.297	7%	307.401	        -1%
neon	118.91	-4%	159.718	21%	479.442	        55%
crc32					1361.446	343%
As you can see there is no benefit in using neon on Cortex A7 (and presumably A5). So neon is turned on by default only on Cortex A8 and above

The above numbers are for CRC calculation itself, so elimination of JNI overhead provides additional benefit. E.g. on Cortex A7 the generic asm implementation gives the below benefit:

buffer size	   + ops/s
Xint	128	20.56%

512	18.06%

1024	14.66%

1048576	5.52%



Xcomp Xbatch	128	77.85%

512	32.63%

1024	20.53%

1048576	6.59%

The patch is here:
http://cr.openjdk.java.net/~snazarki/8155618/ <http://cr.openjdk.java.net/~snazarki/8155618/>

Thanks in advance,
Andrey


More information about the aarch32-port-dev mailing list