RFR JDK-6321472: Add CRC-32C API

Thu Oct 23 01:52:28 UTC 2014

Webrev with these last updates. Added more tests to make sure CRC32C, 
CRC32 and Checksum default methods all are covered.

     http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.07

//Staffan

On 10/22/2014 05:37 PM, Stanimir Simeonoff wrote:
> Hi Staffan,
>
> The readonly buffer (ByteBuffer.asReadOnlyBuffer()) don't have array() 
> "working".
> You can use "int length = Math.min(buffer.remaining, b.length)" 
> instead, same with new byte[Math.min(4096, buffer.remaining)]. Using 
> smaller chunks will be more performance friendly than 
> allocating/eating up a huge byte[].
> If you feel like, add a test with a heap bytebuffer.asReadOnlyBuffer().
>
> Stanimir
>
>
> On Thu, Oct 23, 2014 at 3:06 AM, Staffan Friberg 
> <staffan.friberg at oracle.com <mailto:staffan.friberg at oracle.com>> wrote:
>
>     Hi,
>
>     I was thinking about this earlier when I started writing the patch
>     and then I forgot about it again. I haven't been able to figure
>     out when the code will be executed. ByteBuffer is implemented in
>     such a way  that only the JDK can extend it and as far as I can
>     tell you can only create 3 types of ByteBuffers (Direct, Mapped
>     and Heap), all of which will be handled by the more efficient
>     calls above.
>
>     That said just to make the code a bit safer from OOM it is
>     probably best to update the default method and all current
>     implementations which all use the same pattern.
>
>     A reasonable solution should be the following code
>
>                 byte[] b = new byte[(buffer.remaining() < 4096)
>                         ? buffer.remaining() : 4096];
>                 while (buffer.hasRemaining()) {
>                     int length = (buffer.remaining() < b.length)
>                             ? buffer.remaining() : b.length;
>                     buffer.get(b, 0, length);
>                     update(b, 0, length);
>                 }
>
>     Xueming, do you have any further comment?
>
>     Regards,
>     Staffan
>
>     On 10/22/2014 03:04 PM, Stanimir Simeonoff wrote:
>>
>>
>>     On Thu, Oct 23, 2014 at 12:10 AM, Bernd Eckenfels
>>     <ecki at zusammenkunft.net <mailto:ecki at zusammenkunft.net>> wrote:
>>
>>         Hello,
>>
>>         just a question in the default impl:
>>
>>         +        } else {
>>         +            byte[] b = new byte[rem];
>>         +            buffer.get(b);
>>         +            update(b, 0, b.length);
>>         +        }
>>
>>         would it be a good idea to actually put a ceiling on the size
>>         of the
>>         array which is processed at once? 
>>
>>     This is an excellent catch.
>>     Should not be too large, probably 4k or so.
>>
>>     Stanimir
>>
>>
>>          Am Tue, 21 Oct 2014 10:28:50 -0700
>>         schrieb Staffan Friberg <staffan.friberg at oracle.com
>>         <mailto:staffan.friberg at oracle.com>>:
>>
>>         > Hi Peter,
>>         >
>>         > Thanks for the comments..
>>         > >
>>         > >   217                 if (Unsafe.ADDRESS_SIZE == 4) {
>>         > >   218                     // On 32 bit platforms read two
>>         ints
>>         > > instead of a single 64bit long
>>         > >
>>         > > When you're reading from byte[] using Unsafe
>>         (updateBytes), you
>>         > > have the option of reading 64bit values on 64bit
>>         platforms. When
>>         > > you're reading from DirectByteBuffer memory
>>         > > (updateDirectByteBuffer), you're only using 32bit reads.
>>         > I will add a comment in the code for this decision. The
>>         reason is
>>         > that read a long results in slightly worse performance in
>>         this case,
>>         > in updateBytes it is faster. I was able to get it to run
>>         slightly
>>         > faster by working directly with the address instead of
>>         always adding
>>         > address + off, but this makes things worse in the 32bit
>>         case since
>>         > all calculation will now be using long variables. So using
>>         the getInt
>>         > as in the current code feels like the best solution as it
>>         strikes the
>>         > best balance between 32 and 64bit. Below is how
>>         updateByteBuffer
>>         > looked with the rewrite I mentioned.
>>         >
>>         >
>>         >   ong address = ((DirectBuffer) buffer).address();
>>         >   crc = updateDirectByteBuffer(crc, address + pos, address
>>         + limit);
>>         >
>>         >
>>         >       private static int updateDirectByteBuffer(int crc,
>>         long adr,
>>         > long end) {
>>         >
>>         >          // Do only byte reads for arrays so short they
>>         can't be
>>         > aligned if (end - adr >= 8) {
>>         >
>>         >              // align on 8 bytes
>>         >              int alignLength = (8 - (int) (adr & 0x7)) & 0x7;
>>         >              for (long alignEnd = adr + alignLength; adr <
>>         alignEnd;
>>         > adr++) { crc = (crc >>> 8)
>>         >                          ^ byteTable[(crc ^
>>         UNSAFE.getByte(adr)) &
>>         > 0xFF]; }
>>         >
>>         >              if (ByteOrder.nativeOrder() ==
>>         ByteOrder.BIG_ENDIAN) {
>>         >                  crc = Integer.reverseBytes(crc);
>>         >              }
>>         >
>>         >              // slicing-by-8
>>         >              for (; adr < (end - Long.BYTES); adr +=
>>         Long.BYTES) {
>>         >                  int firstHalf;
>>         >                  int secondHalf;
>>         >                  if (Unsafe.ADDRESS_SIZE == 4) {
>>         >                      // On 32 bit platforms read two ints
>>         instead of
>>         > a single 64bit long firstHalf = UNSAFE.getInt(adr);
>>         >                      secondHalf = UNSAFE.getInt(adr +
>>         Integer.BYTES);
>>         >                  } else {
>>         >                      long value = UNSAFE.getLong(adr);
>>         >                      if (ByteOrder.nativeOrder() ==
>>         > ByteOrder.LITTLE_ENDIAN) { firstHalf = (int) value;
>>         >                          secondHalf = (int) (value >>> 32);
>>         >                      } else { // ByteOrder.BIG_ENDIAN
>>         >                          firstHalf = (int) (value >>> 32);
>>         >                          secondHalf = (int) value;
>>         >                      }
>>         >                  }
>>         >                  crc ^= firstHalf;
>>         >                  if (ByteOrder.nativeOrder() ==
>>         > ByteOrder.LITTLE_ENDIAN) { crc = byteTable7[crc & 0xFF]
>>         >                              ^ byteTable6[(crc >>> 8) & 0xFF]
>>         >                              ^ byteTable5[(crc >>> 16) & 0xFF]
>>         >                              ^ byteTable4[crc >>> 24]
>>         >                              ^ byteTable3[secondHalf & 0xFF]
>>         >                              ^ byteTable2[(secondHalf >>>
>>         8) & 0xFF]
>>         >                              ^ byteTable1[(secondHalf >>>
>>         16) & 0xFF]
>>         >                              ^ byteTable0[secondHalf >>> 24];
>>         >                  } else { // ByteOrder.BIG_ENDIAN
>>         >                      crc = byteTable0[secondHalf & 0xFF]
>>         >                              ^ byteTable1[(secondHalf >>>
>>         8) & 0xFF]
>>         >                              ^ byteTable2[(secondHalf >>>
>>         16) & 0xFF]
>>         >                              ^ byteTable3[secondHalf >>> 24]
>>         >                              ^ byteTable4[crc & 0xFF]
>>         >                              ^ byteTable5[(crc >>> 8) & 0xFF]
>>         >                              ^ byteTable6[(crc >>> 16) & 0xFF]
>>         >                              ^ byteTable7[crc >>> 24];
>>         >                  }
>>         >              }
>>         >
>>         >              if (ByteOrder.nativeOrder() ==
>>         ByteOrder.BIG_ENDIAN) {
>>         >                  crc = Integer.reverseBytes(crc);
>>         >              }
>>         >          }
>>         >
>>         >          // Tail
>>         >          for (; adr < end; adr++) {
>>         >              crc = (crc >>> 8)
>>         >                      ^ byteTable[(crc ^
>>         UNSAFE.getByte(adr)) & 0xFF];
>>         >          }
>>         >
>>         >          return crc;
>>         >      }
>>         >
>>         >
>>         > >
>>         > > Also, in updateBytes, the usage of
>>         > > Unsafe.ARRAY_INT_INDEX_SCALE/ARRAY_LONG_INDEX_SCALE to
>>         index a byte
>>         > > array sounds a little scary. To be ultra portable you
>>         could check
>>         > > that ARRAY_BYTE_INDEX_SCALE == 1 first and refuse to use
>>         Unsafe for
>>         > > byte arrays if it is not 1. Then use
>>         Integer.BYTES/Long.BYTES to
>>         > > manipulate 'offsets' instead. In updateDirectByteBuffer
>>         it would be
>>         > > more appropriate to use Integer.BYTES/Long.BYTES too.
>>         > Good idea. Added a check in the initial if statement and it
>>         will get
>>         > automatically optimized away.
>>         >
>>         > >   225  firstHalf = (int) (value &
>>         > > 0xFFFFFFFF); 226            secondHalf = (int) (value
>>         > > >>> 32); 227            } else { // ByteOrder.BIG_ENDIAN
>>         > >   228  firstHalf = (int) (value >>> 32);
>>         > >   229  secondHalf = (int) (value &
>>         > > 0xFFFFFFFF);
>>         > >
>>         > > firstHalf = (int) value; // this is equivalent for line 225
>>         > > secondHalf = (int) value; // this is equivalent for line 229
>>         > Done.
>>         >
>>         > Here is the latest webrev,
>>         > http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.03
>>         <http://cr.openjdk.java.net/%7Esfriberg/JDK-6321472/webrev.03>
>>         >
>>         > Cheers,
>>         > Staffan
>>
>>
>
>