RFR: 8310843: Reimplement ByteArray and ByteArrayLittleEndian with Unsafe [v10]

Thu Jul 20 21:46:43 UTC 2023

On Thu, 20 Jul 2023 17:27:58 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> Is there any benchmark for DataInput/Output stream that can be used? I mean, it would be interesting to understand how these numbers translate when running the stuff that is built on top.

I've tried to run the benchmark in test/micro/java/io/DataInputStream.java. This is the baseline:

Benchmark                     Mode  Cnt  Score   Error  Units
DataInputStreamTest.readChar  avgt   20  7.583 ± 0.026  us/op
DataInputStreamTest.readInt   avgt   20  3.804 ± 0.045  us/op

And this is with a patch similar to the one I shared above, to use ByteBuffer internally:

Benchmark                     Mode  Cnt  Score   Error  Units
DataInputStreamTest.readChar  avgt   20  7.594 ± 0.106  us/op
DataInputStreamTest.readInt   avgt   20  3.795 ± 0.030  us/op

There does not seem to be any extra overhead. That said, access occurs in a counted loop, and in these cases we know buffer/segment access is optimized quite well.

I believe the question here is: do we have benchmark which are representative of the kind of gain that would be introduced by micro-optimizing ByteArray? It can be quite tricky to estimate real benefits from synthetic benchmark on the ByteArray class, especially when fetching a single element outside of a loop - as those are not representative of how the clients will use this. I note that the original benchmark made by Per used a loop with two iterations to assess the cost of the ByteArray operations:

http://minborgsjavapot.blogspot.com/2023/01/java-21-performance-improvements.html

If I change the benchmark to do 2 iterations, I see this:

Benchmark                      Mode  Cnt       Score       Error   Units
ByteArray.readByte            thrpt    5  704199.172 ± 34101.508  ops/ms
ByteArray.readByteFromBuffer  thrpt    5  474321.828 ±  6588.471  ops/ms
ByteArray.readInt             thrpt    5  662411.181 ±  4470.951  ops/ms
ByteArray.readIntFromBuffer   thrpt    5  496900.429 ±  3705.737  ops/ms
ByteArray.readLong            thrpt    5  665138.063 ±  5944.814  ops/ms
ByteArray.readLongFromBuffer  thrpt    5  517781.548 ± 27106.331  ops/ms

The more the iterations, the less the cost (and you don't need many iterations to break even). This probably explains why the DataInputStream benchmark doesn't change - there's 1024 iterations in there.

I guess all this is to say that excessively focussing on microbenchmark of a simple class such as ByteArray in conditions that are likely unrealistic (e.g. single access) is IMHO the wrong way to look at things, as ByteArray is mostly used by classes that most definitively will read more than one value at a time (including classfile API). 

So, also IMHO, we should try to measure the use cases we care about of the higher-level API we care about (I/O streams, classfile) and then see if adding Unsafe/VarHandle/ByteBuffer access in here is going to lead to any benefit at all.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14636#discussion_r1269993992