RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32

David Chase david.r.chase at oracle.com
Mon May 27 13:27:49 UTC 2013


On 2013-05-27, at 12:38 AM, David Holmes <david.holmes at oracle.com> wrote:

> Hi David,

> That said I still have an issue with this code:
> 
> 147     if (buf) {
> 148           /* Don't know for sure how big an unsigned long is, therefore
> 149              copy one at a time. */
> 150           int i;
> 151           for (i = 0; i < 256; i++) buf[i] = (jint) (crc_table[i]);
> 152           (*env)->ReleasePrimitiveArrayCritical(env, b, buf, 0);
> 153     }
> 
> buf is an array of 32-bit values; crc_table is either 32-bit or 64-bit depending on platform. So on 64-bit we are truncating all the values in crc_table. So presumably these values all fit in 32-bits anyway?

They are guaranteed to.  They are residues modulo a 33-bit polynomial -- i.e., bits beyond the 32nd are zero.  I'm a little surprised that the CRC32 implementation does not use a uint32_t for them, because not doing this makes their tables (they have 4 for a different optimization) take 8k instead of 4k, at least on LP64 platforms.

> Minor nit:
> 
> src/share/classes/java/util/zip/CRC32.java
> 
> + import java.lang.reflect.Field;
> 
> I don't think this import is needed now.

Agreed, this is leftover from the old way of getting Unsafe, but the Unsafe code is gone anyway.

> ---
> 
> test/java/util/zip/TimeChecksum.java
> 
> You added eg:
> 
> time(adler32, data, 2*iters, 16); // warmup short case, too
> 
> which is presumably to ensure both of the primary paths are hit during the warm up. But it isn't obvious to me that the actual test will hit the short case ??

It certainly did hit the short case.  I ran that test, and noticed a significant difference before/after my initial patch, and then back again after I fixed the test.  "The short case" is fewer than (if I recall) 80 bytes of CRC or Adler; there, it is quicker to do the calculation in Java rather than paying the JNI call overhead to get to the slightly faster C implementation.

Here, seeing is believing (oh, but look, I trashed the output format on the warmup, I should fix that):

FIRST, WITH SHORT WARMUP:
dr2chase:zip $ $BB/java TimeChecksum
---------- Adler32 ----------
Warmup...           8       1,312           3           4

Length    byte[](ns/len)  ByteBuffer(direct)   ByteBuffer
1	       2,000	       4,000 (2.00)       8,000 (4.00)     checksum=cd00cd
2	       2,000	       1,500 (0.75)       2,000 (1.00)     checksum=1b500e8
4	         500	         750 (1.50)         750 (1.50)     checksum=50e0223
8	         250	         250 (1.00)         375 (1.50)     checksum=148d0475
16	         562	         312 (0.56)         312 (0.56)     checksum=4999083b
...
---------- CRC32 ----------
Warmup...           6       1,250           3           4

Length    byte[](ns/len)  ByteBuffer(direct)   ByteBuffer
1	       3,000	       3,000 (1.00)       6,000 (2.00)     checksum=40d06116
2	       1,500	       1,500 (1.00)       3,000 (2.00)     checksum=acf34351
4	         500	         750 (1.50)       1,250 (2.50)     checksum=bded3e7d
8	         375	         375 (1.00)         625 (1.67)     checksum=699af2a3
16	         187	         250 (1.33)         312 (1.67)     checksum=44ff2ea
...


NEXT, WITH SHORT WARMUP REMOVED:
dr2chase:zip $ $BB/java TimeChecksum
---------- Adler32 ----------
Warmup...           5           4           3

Length    byte[](ns/len)  ByteBuffer(direct)   ByteBuffer
1	      22,000	       5,000 (0.23)       7,000 (0.32)     checksum=5f005f
2	       1,500	       1,500 (1.00)       3,000 (2.00)     checksum=cd006e
4	         500	         500 (1.00)       1,250 (2.50)     checksum=2860149
8	         375	         375 (1.00)         500 (1.33)     checksum=fa90433
16	         312	         250 (0.80)         312 (1.00)     checksum=45e80933
...
---------- CRC32 ----------
Warmup...           6           3           4

Length    byte[](ns/len)  ByteBuffer(direct)   ByteBuffer
1	      24,000	       3,000 (0.13)       4,000 (0.17)     checksum=5ed1937e
2	       1,500	       1,000 (0.67)       2,000 (1.33)     checksum=f55e7fb4
4	         750	         750 (1.00)       1,000 (1.33)     checksum=ccf3e842
8	         250	         375 (1.50)         625 (2.50)     checksum=802801ba
16	         250	         187 (0.75)         250 (1.00)     checksum=836ef44b
...



David





More information about the core-libs-dev mailing list