Memory Segment efficient array handling

Tue Jun 15 09:25:36 UTC 2021

Hi Maurizio,

I spend a lot of time to analyze the problem. It is indeed related to the wrapping of heap arrays, slicing and so on. I opened a bug report:
https://bugs.openjdk.java.net/browse/JDK-8268743

So please think about adding an API which is highly optimized to bulk copy slices between classical on-heap arrays and MemorySegments! It looks like escape analysis does not work and during our test and the heap was filled with millions of HeapMemorySegment#OfByte slices! Performance degraded significantly, especially due to garbage collection.

Long indexes in for-loops seem to be not the issue here. We proved hat replacing the wrap-byte-array, slice, copyFrom code with Unsafe.copyMemory solves the issue and we have Lucene's new memory mapping implementation behave similar to the old MappedByteBuffer code. (Mapped)ByteBuffer has getByte(byte[], offset, count) which is missing for memory segments and that’s reason for our pain!

You can see the discussion on our latest pull request for JDK 17: https://github.com/apache/lucene/pull/177

Uwe

-----
Uwe Schindler
uschindler at apache.org 
ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
Bremen, Germany
https://lucene.apache.org/
https://solr.apache.org/

> -----Original Message-----
> From: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
> Sent: Thursday, April 1, 2021 2:36 PM
> To: Uwe Schindler <uschindler at apache.org>; 'leerho' <leerho at gmail.com>;
> panama-dev at openjdk.java.net
> Subject: Re: Memory Segment efficient array handling
> 
> I re-read the Lucene/Solr patch to support segments, and one thing
> jumped out: in routines like readLEFloats/Longs, it seems like we do a
> bulk copy if endianness match, but we do a loop copy if endianness
> doesn't match.
> 
> Reading from the ByteBufferInput impl, it doesn't seem to me that the
> impl is ever falling back onto a regular loop.
> 
> https://github.com/apache/lucene-
> solr/blob/d2c0be5a83f8a985710e1ffbadabc70e82c54fb1/lucene/core/src/java
> /org/apache/lucene/store/ByteBufferIndexInput.java#L168
> 
> E.g. it seems  you adjust the endianness on the buffer and then use a
> bulk copy.
> 
> In other words, there might be a performance advantage in having the
> bulk copy methods in MemoryAccess - which is we can take an endianness
> parameter, and copy in bulk with swap (memory segment, internally, has
> the ability to copy bulk with swap, like Unsafe.copySwapMemory).
> 
> That said, I don't think this is the root cause of the perf issues you
> are seeing, since readLongs is always doing a loop (even in the buffer
> world), and readLELongs should do bulk copy most of the times (I assume
> you ran the bench on a LE platform).
> 
> Maurizio
> 
> 
> On 01/04/2021 13:05, Maurizio Cimadamore wrote:
> >
> > On 01/04/2021 12:48, Uwe Schindler wrote:
> >> In our investigations, we also see some slowdown in contrast to our
> >> ByteBuffer implementation. It is not yet clear if it comes from loops
> >> over long instead of ints or if it is caused by the number of object
> >> allocations.
> >
> > It would be helpful if we could narrow this down. I suppose you refer
> > to the benchmark regressions here:
> >
> > https://github.com/apache/lucene-solr/pull/2176#issuecomment-758175143
> >
> > Which are probably not related to the issue of bulk copying.
> >
> > See my other email: having better MemoryAccess routines for bulk
> > copying is mostly an usability thing. There's nothing to suggest that
> > a straight unsafe call is faster than slicing and calling copyFrom, so
> > I wouldn't look there to explain performance differences.
> >
> > Maurizio
> >