RFR: 8310843: Reimplement ByteArray and ByteArrayLittleEndian with Unsafe [v10]

Thu Jul 20 15:19:45 UTC 2023

On Thu, 20 Jul 2023 14:53:36 GMT, Glavo <duke at openjdk.org> wrote:

>> @mcimadamore I compared the performance of `ByteBuffer` and `VarHandle` using a JMH benchmark:
>> 
>> 
>> public class ByteArray {
>> 
>>     private byte[] array;
>>     private ByteBuffer byteBuffer;
>> 
>>     private static final VarHandle INT = MethodHandles.byteArrayViewVarHandle(int[].class, LITTLE_ENDIAN);
>>     private static final VarHandle LONG = MethodHandles.byteArrayViewVarHandle(long[].class, LITTLE_ENDIAN);
>> 
>>     @Setup
>>     public void setup() {
>>         array = new byte[8];
>>         byteBuffer = ByteBuffer.wrap(array).order(LITTLE_ENDIAN);
>> 
>>         new Random(0).nextBytes(array);
>>     }
>> 
>>     @Benchmark
>>     public byte readByte() {
>>         return array[0];
>>     }
>> 
>>     @Benchmark
>>     public byte readByteFromBuffer() {
>>         return byteBuffer.get(0);
>>     }
>> 
>>     @Benchmark
>>     public int readInt() {
>>         return (int) INT.get(array, 0);
>>     }
>> 
>>     @Benchmark
>>     public int readIntFromBuffer() {
>>         return byteBuffer.getInt(0);
>>     }
>> 
>> 
>>     @Benchmark
>>     public long readLong() {
>>         return (long) LONG.get(array, 0);
>>     }
>> 
>>     @Benchmark
>>     public long readLongFromBuffer() {
>>         return byteBuffer.getLong(0);
>>     }
>> }
>> 
>> 
>> Result:
>> 
>> Benchmark                      Mode  Cnt        Score       Error   Units
>> ByteArray.readByte            thrpt    5  1270230.180 ± 29172.551  ops/ms
>> ByteArray.readByteFromBuffer  thrpt    5   623862.080 ± 12167.410  ops/ms
>> ByteArray.readInt             thrpt    5  1252719.463 ± 77598.672  ops/ms
>> ByteArray.readIntFromBuffer   thrpt    5   571070.474 ±  1500.426  ops/ms
>> ByteArray.readLong            thrpt    5  1262720.686 ±   728.100  ops/ms
>> ByteArray.readLongFromBuffer  thrpt    5   571594.800 ±  3376.735  ops/ms
>> 
>> 
>> In this result, ByteBuffer is much slower than VarHandle. Am I doing something wrong? What conditions are needed to make the performance of ByteBuffer close to that of Unsafe?
>
> I tried a few more. It looks like the JIT is able to optimize the ByteBuffer away pretty well by keeping it only as a local variable without escaping.

It seems that as long as the `ByteBuffer` is stored in a field (even if it is `static final`), the JIT compiler cannot completely eliminate the overhead of the `ByteBuffer`.

@mcimadamore I think your suggested changes for `DataInputStream` is dubious, it's likely to introduce non-trivial additional overhead. The correct change may be like this:

public final double readDouble() throws IOException {
    readFully(readBuffer, 0, 8);
-   return ByteArray.getDouble(readBuffer, 0);
+   return ByteBuffer.wrap(readBuffer).getDouble(0);
}

However this change can also increase the warmup time and allocate many small objects before C2 compiles it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14636#discussion_r1269617039