RFR JDK-6321472: Add CRC-32C API

Thu Oct 23 21:40:07 UTC 2014

Hi,

Using unsafe to read fields will make the code very brittle and require 
us to further detect and buffer types and which fields to read and 
handle. Currently this seems to be a rather uncommon case, as the code 
in Adler32 and CRC32 has done a full allocation in JDK 8, and by far 
either wrapped ByteBuffers or DirectByteBuffers are expected to be used. 
So for now I will keep the code as is without further complications.

The performance with the copying looks reasonable. The result is 
multiplied with array length (@OperationsPerInvocation(65536 or 1024)) 
so score ends up being bytes/ms.

64k buffer length
Benchmark                             Mode  Samples        Score Score 
error   Units
o.c.CRC32C.directByteBuffer_64K      thrpt        5 998151.406     
1539.451  ops/ms
o.c.CRC32C.readonlyByteBuffer_64K    thrpt        5 961665.219    
36823.059  ops/ms
o.c.CRC32C.wrappedByteBuffer_64K     thrpt        5 1030445.833     
9069.889  ops/ms

1k buffer length
Benchmark                            Mode  Samples       Score Score 
error   Units
o.c.CRC32C.directByteBuffer_1K      thrpt        5  987614.289 3506.829  
ops/ms
o.c.CRC32C.readonlyByteBuffer_1K    thrpt        5  691888.488 
217080.205  ops/ms
o.c.CRC32C.wrappedByteBuffer_1K     thrpt        5 974676.434      
463.487  ops/ms

//Staffan

On 10/23/2014 02:05 AM, Peter Levart wrote:
> On 10/23/2014 10:16 AM, Stanimir Simeonoff wrote:
>> Unsafe is available, so the fields (array, offset) can be read directly
>> UNSAFE.getObject(buffer, hbOffset), UNSAFE.getObject(buffer, 
>> offsetOffset).
>> No need for MethodHandlers.
>> During class init the offsets have to be resolved, pretty much like 
>> any CAS
>> utilizing algorithm.
>>
>> I didn't propose it as readOnlyBuffers are very, very rarely used and 
>> even
>> more unlikely to be used to calculate checksums. It just makes the code
>> ugly.
>
> Agreed. And when Staffan introduces intrinsic, he could pass the 
> ByteBuffer instance to it and extract the byte[] there...
>
> Peter
>
>>
>> Stanimir
>>
>> On Thu, Oct 23, 2014 at 11:05 AM, Peter Levart <peter.levart at gmail.com>
>> wrote:
>>
>>> On 10/23/2014 03:52 AM, Staffan Friberg wrote:
>>>
>>>> Webrev with these last updates. Added more tests to make sure CRC32C,
>>>> CRC32 and Checksum default methods all are covered.
>>>>
>>>> http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.07
>>>>
>>> Hi Staffan,
>>>
>>> Regarding default case:
>>>
>>>   168         } else {
>>>   169             byte[] b = new byte[Math.min(buffer.remaining(), 
>>> 4096)];
>>>   170             while (buffer.hasRemaining()) {
>>>   171                 int length = Math.min(buffer.remaining(), 
>>> b.length);
>>>   172                 buffer.get(b, 0, length);
>>>   173                 update(b, 0, length);
>>>   174             }
>>>   175         }
>>>
>>>
>>> Have you tried using get()/getInt() directly on the (ro) ByteBuffer
>>> instead of copying to byte[] chunks? Intuitively one would expect it
>>> perform faster if a redundant copy is avoided. Ah, you already told 
>>> us that
>>> you plan to use intrinsic for CRC32C in the future, so you want to have
>>> "addresses" at hand.
>>>
>>> A hackish way to avoid copying in this case is to access the byte[] and
>>> offset using reflection. But this would have to be wrapped with
>>> doPrivileged() which would worsen performance for small buffers. A 
>>> way to
>>> avoid repeated access checks is to do them at class initialization 
>>> time,
>>> using MethodHandle(s). For example, something like:
>>>
>>>
>>>      private static final MethodHandle bbArrayGetter;
>>>      private static final MethodHandle bbArrayOffsetGetter;
>>>
>>>      static {
>>>          MethodHandle hbGetter;
>>>          MethodHandle offsetGetter;
>>>          try {
>>>              Field hbField = ByteBuffer.class.getDeclaredField("hb");
>>>              Field offsetField = ByteBuffer.class.
>>> getDeclaredField("offset");
>>>              AccessController.doPrivileged(new 
>>> PrivilegedAction<Void>() {
>>>                  @Override
>>>                  public Void run() {
>>>                      hbField.setAccessible(true);
>>>                      offsetField.setAccessible(true);
>>>                      return null;
>>>                  }
>>>              });
>>>              hbGetter = 
>>> MethodHandles.lookup().unreflectGetter(hbField);
>>>              offsetGetter = MethodHandles.lookup().
>>> unreflectGetter(offsetField);
>>>          } catch (NoSuchFieldException | IllegalAccessException e) {
>>>              hbGetter = null;
>>>              offsetGetter = null;
>>>          }
>>>          bbArrayGetter = hbGetter;
>>>          bbArrayOffsetGetter = offsetGetter;
>>>      }
>>>
>>>      private static byte[] getArrayOrNull(ByteBuffer bb) {
>>>          if (bb.hasArray()) return bb.array();
>>>          if (bbArrayGetter != null) {
>>>              try {
>>>                  return (byte[]) bbArrayGetter.invokeExact(bb);
>>>              } catch (Throwable e) {
>>>                  throw new InternalError(e);
>>>              }
>>>          }
>>>          return null;
>>>      }
>>>
>>>      private static int getArrayOffset(ByteBuffer bb) {
>>>          if (bb.hasArray()) return bb.arrayOffset();
>>>          if (bbArrayOffsetGetter != null) {
>>>              try {
>>>                  return (int) bbArrayOffsetGetter.invokeExact(bb);
>>>              } catch (Throwable e) {
>>>                  throw new InternalError(e);
>>>              }
>>>          }
>>>          throw new UnsupportedOperationException();
>>>      }
>>>
>>>
>>>
>>> Regards, Peter
>>>
>>>
>>>> //Staffan
>>>>
>>>> On 10/22/2014 05:37 PM, Stanimir Simeonoff wrote:
>>>>
>>>>> Hi Staffan,
>>>>>
>>>>> The readonly buffer (ByteBuffer.asReadOnlyBuffer()) don't have 
>>>>> array()
>>>>> "working".
>>>>> You can use "int length = Math.min(buffer.remaining, b.length)" 
>>>>> instead,
>>>>> same with new byte[Math.min(4096, buffer.remaining)]. Using 
>>>>> smaller chunks
>>>>> will be more performance friendly than allocating/eating up a huge 
>>>>> byte[].
>>>>> If you feel like, add a test with a heap 
>>>>> bytebuffer.asReadOnlyBuffer().
>>>>>
>>>>> Stanimir
>>>>>
>>>>>
>>>>> On Thu, Oct 23, 2014 at 3:06 AM, Staffan Friberg <
>>>>> staffan.friberg at oracle.com <mailto:staffan.friberg at oracle.com>> 
>>>>> wrote:
>>>>>
>>>>>      Hi,
>>>>>
>>>>>      I was thinking about this earlier when I started writing the 
>>>>> patch
>>>>>      and then I forgot about it again. I haven't been able to figure
>>>>>      out when the code will be executed. ByteBuffer is implemented in
>>>>>      such a way  that only the JDK can extend it and as far as I can
>>>>>      tell you can only create 3 types of ByteBuffers (Direct, Mapped
>>>>>      and Heap), all of which will be handled by the more efficient
>>>>>      calls above.
>>>>>
>>>>>      That said just to make the code a bit safer from OOM it is
>>>>>      probably best to update the default method and all current
>>>>>      implementations which all use the same pattern.
>>>>>
>>>>>      A reasonable solution should be the following code
>>>>>
>>>>>                  byte[] b = new byte[(buffer.remaining() < 4096)
>>>>>                          ? buffer.remaining() : 4096];
>>>>>                  while (buffer.hasRemaining()) {
>>>>>                      int length = (buffer.remaining() < b.length)
>>>>>                              ? buffer.remaining() : b.length;
>>>>>                      buffer.get(b, 0, length);
>>>>>                      update(b, 0, length);
>>>>>                  }
>>>>>
>>>>>      Xueming, do you have any further comment?
>>>>>
>>>>>      Regards,
>>>>>      Staffan
>>>>>
>>>>>      On 10/22/2014 03:04 PM, Stanimir Simeonoff wrote:
>>>>>
>>>>>>
>>>>>>      On Thu, Oct 23, 2014 at 12:10 AM, Bernd Eckenfels
>>>>>>      <ecki at zusammenkunft.net <mailto:ecki at zusammenkunft.net>> wrote:
>>>>>>
>>>>>>          Hello,
>>>>>>
>>>>>>          just a question in the default impl:
>>>>>>
>>>>>>          +        } else {
>>>>>>          +            byte[] b = new byte[rem];
>>>>>>          +            buffer.get(b);
>>>>>>          +            update(b, 0, b.length);
>>>>>>          +        }
>>>>>>
>>>>>>          would it be a good idea to actually put a ceiling on the 
>>>>>> size
>>>>>>          of the
>>>>>>          array which is processed at once?
>>>>>>      This is an excellent catch.
>>>>>>      Should not be too large, probably 4k or so.
>>>>>>
>>>>>>      Stanimir
>>>>>>
>>>>>>
>>>>>>           Am Tue, 21 Oct 2014 10:28:50 -0700
>>>>>>          schrieb Staffan Friberg <staffan.friberg at oracle.com
>>>>>>          <mailto:staffan.friberg at oracle.com>>:
>>>>>>
>>>>>>
>>>>>>          > Hi Peter,
>>>>>>          >
>>>>>>          > Thanks for the comments..
>>>>>>          > >
>>>>>>          > >   217                 if (Unsafe.ADDRESS_SIZE == 4) {
>>>>>>          > >   218                     // On 32 bit platforms 
>>>>>> read two
>>>>>>          ints
>>>>>>          > > instead of a single 64bit long
>>>>>>          > >
>>>>>>          > > When you're reading from byte[] using Unsafe
>>>>>>          (updateBytes), you
>>>>>>          > > have the option of reading 64bit values on 64bit
>>>>>>          platforms. When
>>>>>>          > > you're reading from DirectByteBuffer memory
>>>>>>          > > (updateDirectByteBuffer), you're only using 32bit 
>>>>>> reads.
>>>>>>          > I will add a comment in the code for this decision. The
>>>>>>          reason is
>>>>>>          > that read a long results in slightly worse performance in
>>>>>>          this case,
>>>>>>          > in updateBytes it is faster. I was able to get it to run
>>>>>>          slightly
>>>>>>          > faster by working directly with the address instead of
>>>>>>          always adding
>>>>>>          > address + off, but this makes things worse in the 32bit
>>>>>>          case since
>>>>>>          > all calculation will now be using long variables. So 
>>>>>> using
>>>>>>          the getInt
>>>>>>          > as in the current code feels like the best solution as it
>>>>>>          strikes the
>>>>>>          > best balance between 32 and 64bit. Below is how
>>>>>>          updateByteBuffer
>>>>>>          > looked with the rewrite I mentioned.
>>>>>>          >
>>>>>>          >
>>>>>>          >   ong address = ((DirectBuffer) buffer).address();
>>>>>>          >   crc = updateDirectByteBuffer(crc, address + pos, 
>>>>>> address
>>>>>>          + limit);
>>>>>>          >
>>>>>>          >
>>>>>>          >       private static int updateDirectByteBuffer(int crc,
>>>>>>          long adr,
>>>>>>          > long end) {
>>>>>>          >
>>>>>>          >          // Do only byte reads for arrays so short they
>>>>>>          can't be
>>>>>>          > aligned if (end - adr >= 8) {
>>>>>>          >
>>>>>>          >              // align on 8 bytes
>>>>>>          >              int alignLength = (8 - (int) (adr & 0x7)) 
>>>>>> & 0x7;
>>>>>>          >              for (long alignEnd = adr + alignLength; 
>>>>>> adr <
>>>>>>          alignEnd;
>>>>>>          > adr++) { crc = (crc >>> 8)
>>>>>>          >                          ^ byteTable[(crc ^
>>>>>>          UNSAFE.getByte(adr)) &
>>>>>>          > 0xFF]; }
>>>>>>          >
>>>>>>          >              if (ByteOrder.nativeOrder() ==
>>>>>>          ByteOrder.BIG_ENDIAN) {
>>>>>>          >                  crc = Integer.reverseBytes(crc);
>>>>>>          >              }
>>>>>>          >
>>>>>>          >              // slicing-by-8
>>>>>>          >              for (; adr < (end - Long.BYTES); adr +=
>>>>>>          Long.BYTES) {
>>>>>>          >                  int firstHalf;
>>>>>>          >                  int secondHalf;
>>>>>>          >                  if (Unsafe.ADDRESS_SIZE == 4) {
>>>>>>          >                      // On 32 bit platforms read two ints
>>>>>>          instead of
>>>>>>          > a single 64bit long firstHalf = UNSAFE.getInt(adr);
>>>>>>          >                      secondHalf = UNSAFE.getInt(adr +
>>>>>>          Integer.BYTES);
>>>>>>          >                  } else {
>>>>>>          >                      long value = UNSAFE.getLong(adr);
>>>>>>          >                      if (ByteOrder.nativeOrder() ==
>>>>>>          > ByteOrder.LITTLE_ENDIAN) { firstHalf = (int) value;
>>>>>>          >                          secondHalf = (int) (value >>> 
>>>>>> 32);
>>>>>>          >                      } else { // ByteOrder.BIG_ENDIAN
>>>>>>          >                          firstHalf = (int) (value >>> 
>>>>>> 32);
>>>>>>          >                          secondHalf = (int) value;
>>>>>>          >                      }
>>>>>>          >                  }
>>>>>>          >                  crc ^= firstHalf;
>>>>>>          >                  if (ByteOrder.nativeOrder() ==
>>>>>>          > ByteOrder.LITTLE_ENDIAN) { crc = byteTable7[crc & 0xFF]
>>>>>>          >                              ^ byteTable6[(crc >>> 8) 
>>>>>> & 0xFF]
>>>>>>          >                              ^ byteTable5[(crc >>> 16) 
>>>>>> & 0xFF]
>>>>>>          >                              ^ byteTable4[crc >>> 24]
>>>>>>          >                              ^ byteTable3[secondHalf & 
>>>>>> 0xFF]
>>>>>>          >                              ^ byteTable2[(secondHalf >>>
>>>>>>          8) & 0xFF]
>>>>>>          >                              ^ byteTable1[(secondHalf >>>
>>>>>>          16) & 0xFF]
>>>>>>          >                              ^ byteTable0[secondHalf 
>>>>>> >>> 24];
>>>>>>          >                  } else { // ByteOrder.BIG_ENDIAN
>>>>>>          >                      crc = byteTable0[secondHalf & 0xFF]
>>>>>>          >                              ^ byteTable1[(secondHalf >>>
>>>>>>          8) & 0xFF]
>>>>>>          >                              ^ byteTable2[(secondHalf >>>
>>>>>>          16) & 0xFF]
>>>>>>          >                              ^ byteTable3[secondHalf 
>>>>>> >>> 24]
>>>>>>          >                              ^ byteTable4[crc & 0xFF]
>>>>>>          >                              ^ byteTable5[(crc >>> 8) 
>>>>>> & 0xFF]
>>>>>>          >                              ^ byteTable6[(crc >>> 16) 
>>>>>> & 0xFF]
>>>>>>          >                              ^ byteTable7[crc >>> 24];
>>>>>>          >                  }
>>>>>>          >              }
>>>>>>          >
>>>>>>          >              if (ByteOrder.nativeOrder() ==
>>>>>>          ByteOrder.BIG_ENDIAN) {
>>>>>>          >                  crc = Integer.reverseBytes(crc);
>>>>>>          >              }
>>>>>>          >          }
>>>>>>          >
>>>>>>          >          // Tail
>>>>>>          >          for (; adr < end; adr++) {
>>>>>>          >              crc = (crc >>> 8)
>>>>>>          >                      ^ byteTable[(crc ^
>>>>>>          UNSAFE.getByte(adr)) & 0xFF];
>>>>>>          >          }
>>>>>>          >
>>>>>>          >          return crc;
>>>>>>          >      }
>>>>>>          >
>>>>>>          >
>>>>>>          > >
>>>>>>          > > Also, in updateBytes, the usage of
>>>>>>          > > Unsafe.ARRAY_INT_INDEX_SCALE/ARRAY_LONG_INDEX_SCALE to
>>>>>>          index a byte
>>>>>>          > > array sounds a little scary. To be ultra portable you
>>>>>>          could check
>>>>>>          > > that ARRAY_BYTE_INDEX_SCALE == 1 first and refuse to 
>>>>>> use
>>>>>>          Unsafe for
>>>>>>          > > byte arrays if it is not 1. Then use
>>>>>>          Integer.BYTES/Long.BYTES to
>>>>>>          > > manipulate 'offsets' instead. In updateDirectByteBuffer
>>>>>>          it would be
>>>>>>          > > more appropriate to use Integer.BYTES/Long.BYTES too.
>>>>>>          > Good idea. Added a check in the initial if statement 
>>>>>> and it
>>>>>>          will get
>>>>>>          > automatically optimized away.
>>>>>>          >
>>>>>>          > >   225  firstHalf = (int) (value &
>>>>>>          > > 0xFFFFFFFF); 226 secondHalf = (int) (value
>>>>>>          > > >>> 32); 227            } else { // 
>>>>>> ByteOrder.BIG_ENDIAN
>>>>>>          > >   228  firstHalf = (int) (value >>> 32);
>>>>>>          > >   229  secondHalf = (int) (value &
>>>>>>          > > 0xFFFFFFFF);
>>>>>>          > >
>>>>>>          > > firstHalf = (int) value; // this is equivalent for 
>>>>>> line 225
>>>>>>          > > secondHalf = (int) value; // this is equivalent for 
>>>>>> line 229
>>>>>>          > Done.
>>>>>>          >
>>>>>>          > Here is the latest webrev,
>>>>>>          > 
>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.03
>>>>>> <http://cr.openjdk.java.net/%7Esfriberg/JDK-6321472/webrev.03>
>>>>>>          >
>>>>>>          > Cheers,
>>>>>>          > Staffan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>