Unsafe vs MemorySegments / Bounds checking...

Thu Oct 31 19:04:41 UTC 2024

Yes, I think it's already some work to still switch every node from the
current representation to a MemorySegment version...

Then, I need an allocation strategy and the pages wrapping one or two
MemorySegments should be cached for reuse, so I'll cut down the current
tremendous allocation rate due to new page allocations everytime something
is read from disk. Instead I'll take a page from a list or better a
map/cache of free pages for the different size classes and simply reuse
these. Maybe I can also allocate them upfront for the whole duration with a
global Arena and the user has to specify a max size for each page class in
the buffer pool (so, essentially N buffer pools, one for each size, that
might be the easiest to implement at least)... afterwards I'll check that I
can scan a resource without having to create any nodes. Then checking SIMD
based filtering, but I think currently I'll first try the off-heap storage
to reduce GC pressure to a minimum.

Maurizio Cimadamore <maurizio.cimadamore at oracle.com> schrieb am Do., 31.
Okt. 2024, 19:41:

>
> On 31/10/2024 18:08, Johannes Lichtenberger wrote:
> >
> > So, I'm also not sure what it means to instantiate a MemorySegment in
> > this way: MemorySegment.NULL.reinterpret(Long.MAX_VALUE)
> >
> > It's just the "wrapper" on the Java heap, but it doesn't map the whole
> > virtual address space or does it!?
> >
> Reinterpret does not allocate anything, nor does it map memory.
>
> It is an unsafe way to alter the bounds/arena of a memory segment, so
> that you get a _new_ memory segment instance that points to the same
> memory as the old one - but with different size and scope.
>
> In your case, as I explained in my other email, I don't think you should
> concern too much with bound checks -- your previous solution was not
> based on Unsafe, but on arrays, and arrays are always bound-checked.
>
> And, more generally, it is always better -- where possible -- to try to
> get to performance improvements via algorithmic changes (such as making
> your library more SIMD-friendly) rather than chasing some (likely minor,
> in comparison) performance delta by completely giving up safety.
>
> I believe Lucene is a good example of what I mean by this (Uwe correct
> me if wrong):
>
> * Lucene first improved its safety by using memory segment instead of
> the existing byte buffer + Unsafe.invokeCleaner combo
> * Also, efficiency increased, as memory segments allow for much bigger
> memory-mapped files
> * But, once you have a segment, why not using the Vector API [1]
> * And also, since segments and Linker play together, why also not
> improving the ways memory-mapped segments are created, by leveraging all
> the goodies provided by the posix C API [2] ?
>
> I don't think in that case "just disabling checks" was ever considered
> as a viable option to make things faster. And the cost of bound checks,
> compared to the massive gains provided by vectorization are... just not
> worth it (at least for the "normal cases").
>
> Maurizio
>
> [1] -
> https://www.elastic.co/blog/accelerating-vector-search-simd-instructions
> [2] -
>
> https://www.elastic.co/search-labs/blog/lucene-and-java-moving-forward-together
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241031/d6ca5da3/attachment-0001.htm>