RFR JDK-6321472: Add CRC-32C API

Thu Oct 23 00:06:17 UTC 2014

Hi,

I was thinking about this earlier when I started writing the patch and 
then I forgot about it again. I haven't been able to figure out when the 
code will be executed. ByteBuffer is implemented in such a way  that 
only the JDK can extend it and as far as I can tell you can only create 
3 types of ByteBuffers (Direct, Mapped and Heap), all of which will be 
handled by the more efficient calls above.

That said just to make the code a bit safer from OOM it is probably best 
to update the default method and all current implementations which all 
use the same pattern.

A reasonable solution should be the following code

             byte[] b = new byte[(buffer.remaining() < 4096)
                     ? buffer.remaining() : 4096];
             while (buffer.hasRemaining()) {
                 int length = (buffer.remaining() < b.length)
                         ? buffer.remaining() : b.length;
                 buffer.get(b, 0, length);
                 update(b, 0, length);
             }

Xueming, do you have any further comment?

Regards,
Staffan

On 10/22/2014 03:04 PM, Stanimir Simeonoff wrote:
>
>
> On Thu, Oct 23, 2014 at 12:10 AM, Bernd Eckenfels 
> <ecki at zusammenkunft.net <mailto:ecki at zusammenkunft.net>> wrote:
>
>     Hello,
>
>     just a question in the default impl:
>
>     +        } else {
>     +            byte[] b = new byte[rem];
>     +            buffer.get(b);
>     +            update(b, 0, b.length);
>     +        }
>
>     would it be a good idea to actually put a ceiling on the size of the
>     array which is processed at once? 
>
> This is an excellent catch.
> Should not be too large, probably 4k or so.
>
> Stanimir
>
>
>      Am Tue, 21 Oct 2014 10:28:50 -0700
>     schrieb Staffan Friberg <staffan.friberg at oracle.com
>     <mailto:staffan.friberg at oracle.com>>:
>
>     > Hi Peter,
>     >
>     > Thanks for the comments..
>     > >
>     > >   217                 if (Unsafe.ADDRESS_SIZE == 4) {
>     > >   218                     // On 32 bit platforms read two ints
>     > > instead of a single 64bit long
>     > >
>     > > When you're reading from byte[] using Unsafe (updateBytes), you
>     > > have the option of reading 64bit values on 64bit platforms. When
>     > > you're reading from DirectByteBuffer memory
>     > > (updateDirectByteBuffer), you're only using 32bit reads.
>     > I will add a comment in the code for this decision. The reason is
>     > that read a long results in slightly worse performance in this case,
>     > in updateBytes it is faster. I was able to get it to run slightly
>     > faster by working directly with the address instead of always adding
>     > address + off, but this makes things worse in the 32bit case since
>     > all calculation will now be using long variables. So using the
>     getInt
>     > as in the current code feels like the best solution as it
>     strikes the
>     > best balance between 32 and 64bit. Below is how updateByteBuffer
>     > looked with the rewrite I mentioned.
>     >
>     >
>     >   ong address = ((DirectBuffer) buffer).address();
>     >   crc = updateDirectByteBuffer(crc, address + pos, address + limit);
>     >
>     >
>     >       private static int updateDirectByteBuffer(int crc, long adr,
>     > long end) {
>     >
>     >          // Do only byte reads for arrays so short they can't be
>     > aligned if (end - adr >= 8) {
>     >
>     >              // align on 8 bytes
>     >              int alignLength = (8 - (int) (adr & 0x7)) & 0x7;
>     >              for (long alignEnd = adr + alignLength; adr < alignEnd;
>     > adr++) { crc = (crc >>> 8)
>     >                          ^ byteTable[(crc ^ UNSAFE.getByte(adr)) &
>     > 0xFF]; }
>     >
>     >              if (ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN) {
>     >                  crc = Integer.reverseBytes(crc);
>     >              }
>     >
>     >              // slicing-by-8
>     >              for (; adr < (end - Long.BYTES); adr += Long.BYTES) {
>     >                  int firstHalf;
>     >                  int secondHalf;
>     >                  if (Unsafe.ADDRESS_SIZE == 4) {
>     >                      // On 32 bit platforms read two ints instead of
>     > a single 64bit long firstHalf = UNSAFE.getInt(adr);
>     >                      secondHalf = UNSAFE.getInt(adr +
>     Integer.BYTES);
>     >                  } else {
>     >                      long value = UNSAFE.getLong(adr);
>     >                      if (ByteOrder.nativeOrder() ==
>     > ByteOrder.LITTLE_ENDIAN) { firstHalf = (int) value;
>     >                          secondHalf = (int) (value >>> 32);
>     >                      } else { // ByteOrder.BIG_ENDIAN
>     >                          firstHalf = (int) (value >>> 32);
>     >                          secondHalf = (int) value;
>     >                      }
>     >                  }
>     >                  crc ^= firstHalf;
>     >                  if (ByteOrder.nativeOrder() ==
>     > ByteOrder.LITTLE_ENDIAN) { crc = byteTable7[crc & 0xFF]
>     >                              ^ byteTable6[(crc >>> 8) & 0xFF]
>     >                              ^ byteTable5[(crc >>> 16) & 0xFF]
>     >                              ^ byteTable4[crc >>> 24]
>     >                              ^ byteTable3[secondHalf & 0xFF]
>     >                              ^ byteTable2[(secondHalf >>> 8) & 0xFF]
>     >                              ^ byteTable1[(secondHalf >>> 16) &
>     0xFF]
>     >                              ^ byteTable0[secondHalf >>> 24];
>     >                  } else { // ByteOrder.BIG_ENDIAN
>     >                      crc = byteTable0[secondHalf & 0xFF]
>     >                              ^ byteTable1[(secondHalf >>> 8) & 0xFF]
>     >                              ^ byteTable2[(secondHalf >>> 16) &
>     0xFF]
>     >                              ^ byteTable3[secondHalf >>> 24]
>     >                              ^ byteTable4[crc & 0xFF]
>     >                              ^ byteTable5[(crc >>> 8) & 0xFF]
>     >                              ^ byteTable6[(crc >>> 16) & 0xFF]
>     >                              ^ byteTable7[crc >>> 24];
>     >                  }
>     >              }
>     >
>     >              if (ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN) {
>     >                  crc = Integer.reverseBytes(crc);
>     >              }
>     >          }
>     >
>     >          // Tail
>     >          for (; adr < end; adr++) {
>     >              crc = (crc >>> 8)
>     >                      ^ byteTable[(crc ^ UNSAFE.getByte(adr)) &
>     0xFF];
>     >          }
>     >
>     >          return crc;
>     >      }
>     >
>     >
>     > >
>     > > Also, in updateBytes, the usage of
>     > > Unsafe.ARRAY_INT_INDEX_SCALE/ARRAY_LONG_INDEX_SCALE to index a
>     byte
>     > > array sounds a little scary. To be ultra portable you could check
>     > > that ARRAY_BYTE_INDEX_SCALE == 1 first and refuse to use
>     Unsafe for
>     > > byte arrays if it is not 1. Then use Integer.BYTES/Long.BYTES to
>     > > manipulate 'offsets' instead. In updateDirectByteBuffer it
>     would be
>     > > more appropriate to use Integer.BYTES/Long.BYTES too.
>     > Good idea. Added a check in the initial if statement and it will get
>     > automatically optimized away.
>     >
>     > >   225                         firstHalf = (int) (value &
>     > > 0xFFFFFFFF); 226  secondHalf = (int) (value
>     > > >>> 32); 227                     } else { // ByteOrder.BIG_ENDIAN
>     > >   228                         firstHalf = (int) (value >>> 32);
>     > >   229                         secondHalf = (int) (value &
>     > > 0xFFFFFFFF);
>     > >
>     > > firstHalf = (int) value; // this is equivalent for line 225
>     > > secondHalf = (int) value; // this is equivalent for line 229
>     > Done.
>     >
>     > Here is the latest webrev,
>     > http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.03
>     <http://cr.openjdk.java.net/%7Esfriberg/JDK-6321472/webrev.03>
>     >
>     > Cheers,
>     > Staffan
>
>