RFR [9] 8148117: Move sun.misc.Cleaner to jdk.internal.ref

Sat Jan 23 22:07:01 UTC 2016

Hi Andrew,

Andrew Haley [mailto:aph at redhat.com] wrote:
> On 23/01/16 20:01, Uwe Schindler wrote:
> 
> > It depends how small! If the speed is still somewhere between Java 8
> > ByteBuffer performance and the recent Hotspot improvements in Java
> > 9, I agree with trying it out. But some volatile memory access on
> > every access is a no-go. The code around ByteBufferIndexInput in
> > Lucene is the most performance-critical critical code, because on
> > every search query or sorting there is all the work happening in
> > there (millions of iterations with positional ByteBuffer.get*
> > calls). As ByteBuffers are limited to 2 GiB, we also need lots of
> > hairy code to work around that limitation!
> 
> Yes, I see that code.  It would be helpful if there were a
> self-contained but realistic benchmark using that code.  That way,
> some simple experiments would allow changes to be measured.

Unfortunately I am a bit occupied with preparing for FOSDEM and also we switched Lucene/Solr to GIT today, so I cannot write a benchmark today. I know that Robert Muir (I CC'ed him) did a lot of performance testing with the ByteBufferIndexInput, maybe we can work both on some isolated benchmark, that you can use for testing. But this may take a while.

> > If you look at ByteBufferIndexInput's code you will see that we
> > simply do stuff like trying to read from one bytebuffer and only if
> > we catch an BufferUnderflowException we fall back to handling buffer
> > switches: Instead of checking bounds on every access, we have
> > fallback code only happening on exceptions. E.g. if you are 3 bytes
> > before end of one buffer slice and read a long, it will throw
> > BufferUnderflow. When this happens the code will fall back to read
> > byte by byte from 2 different buffers and reassemble the long):
> 
> I'm surprised you don't see painful deoptimization traps when that
> happens.  I suppose it's rare enough that you don't care.

Exactly! Of course the accesses to the ByteBufferIndexInput are not really completely random. You always have accesses concentrated on places close (otherwise this would produce lots of swapping and I/O). As the size of the ByteBuffer (the chunkSizePower in this code) is 1 GiB (2^30) by default, the probability to catch that exception and cause deoptimization is very low. By the way it is not 2 GiB, because signed integers and maximum array size in Java don't allow to reach 2^31 (only 2^21 - 1, damn!). So we chunk the file into 1 GiB large ByteBuffers.

> There's a
> new group of methods in JDK9 called Objects.checkIndex() which are
> intended to provide a very efficient way to do bounds checks. It might
> be interesting to see if they work well with ByteBufferIndexInput:
> that's an important use case.

We can check for this. The main idea in this code is to not do similar bounds checks both in ByteBuffer and outside. After switching to the try/catch we improved the whole situation. We can try with Java 9, but using Objects.checkIndex() can at earliest be used once Lucene is on Java 9 (or with Multi-Version-JAR files with ByteBufferIndexInput alternate impl for Java 9+).

> BTW, does anyone here know why we don't have humongous ByteBuffers
> with a long index?

I think 64 bit arrays were proposed for Java 10, so are ByteBuffers with longs (or VarHandles using longs). This is also an old and long going discussion.

Uwe

> Andrew.

Thanks,
Uwe