Accessing foreign memory that already exists

Antoine Chambille ach at activeviam.com
Tue Mar 31 20:58:55 UTC 2020


Hi Maurizio,

Thank you for the explanations and for your interest.

In short, for the use case of an in-memory analytical database you need a
specialized memory allocator to manage tables and indexes, and direct
read/write access to the data, in pure Java, from many concurrent threads.
It's ok if the memory segments are just façades to the memory and don't
actually manage it.

To answer you questions directly:
* you'd like this segment to have a known size
  -> that would be handy, the segment could be used directly without the
need of a parent structure to hold the size.
* you'd like this segment to be closeable, and, upon close() some
well-known native function in your allocator should be invoked
  -> indeed that would be the right place to have a "cleaner". not
mandatory though, it can be done externally.
* you'd probably like this segment not to be confined
  -> absolutely! we need massively parallel data access for data loading
(mount large datasets on demand in the cloud for short lived sessions) and
for aggregations (interactive query times even on terabytes).



In a bit more detail:
Modern analytical databases are based on column stores, including the one
we develop at ActiveViam that is called ActivePivot. The data is stored in
binary columns, with a few indexing structures derived from hash tables and
bitmap indexes. Those data structures are essentially made of big,
long-lived arrays. To support very large datasets we allocate them
off-heap, and we use the Java heap for aggregations and calculations.

Currently in ActivePivot the off-heap memory is managed by a SLAB
allocator, based on mmap, that supports highly concurrent allocations and
deallocations. It's also NUMA aware, so that during aggregations Java
threads process the data partitions on the same NUMA node. Java threads
read and write the data using sun.misc.Unsafe. The data access performance
is good and predictable, there are no boundary checks. But optimizations
such as loop unrolling and vectorization that work on java arrays are lost
with Unsafe. And in many cases (column scans, joins, aggregations) we could
use the panama Vector API that we also anticipate eagerly, and that would
not work with Unsafe. For those reasons, we would like to return to the
ranks and rebase our data access code on memory segments.

Thanks,
-Antoine





On Tue, Mar 31, 2020 at 12:36 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Antoine,
> this is an interesting use case, and one I've been thinking quite a bit
> recently, as it comes up with native interop (see below).
>
> In general there are two categories of memory addresses: checked ones
> (the ones with a known segment attached to them) and unchecked ones (the
> ones with no segment attached to them, or the ones that have the special
> Nothing segment attached to them).
>
> Our policy is that addresses that are not backed by a segment _cannot_
> be de-referenced. This is how we've been achieving safety for the basic
> foreign memory access use case that doesn't do native interop. (we're
> discussing as to whether that's the right default, based on some library
> porting activity we've been doing recently - but there doesn't seem
> clear evidence pointing one way or another).
>
> But there are cases where you might want to take an existing address,
> which is backed by no existing segment, and attach a segment to it -
> which will make it fully functional again - this operation is called
> 'rebasing an address':
>
>
> https://github.com/openjdk/panama-foreign/blob/foreign-abi/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemoryAddress.java#L83
>
> So, with all this in mind, the goal to do what you want is to be able to
> (unsafely!) create a memory segment which has roughly the
> characteristics you need - e.g. given base address and given size. The
> native interop branch has a useful method for making these unchecked
> segments:
>
>
> https://github.com/openjdk/panama-foreign/blob/foreign-abi/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/Foreign.java#L100
>
> In other words, let's say you have a long address "addr" and that you
> want to create a segment around it:
>
> 1) create a memory address out of "addr"
>
> var base = MemoryAddress.ofLong(addr)
>
> 2) create an unchecked segment with right base address and size
>
> var segment = Foreign.ofNativeUnchecked(base, size)
>
> And voila, you now have a segment for your non-Java generated address.
>
> Few notes:
>
> * since the address has been generated by you, when you close this
> segment, the memory access API won't attempt to do anything fancy here
> (but it will make all the addresses based on that segment invalid);
> options we have discussed here is to add ways to attach custom 'cleanup'
> functions - I'm a bit skeptical of those, but I can be convinced given
> the right use cases
>
> * the segment will be confined on the calling thread - meaning that it
> can only be accessed and closed by that thread (as a regular segment)
>
> I think here we can do things to allow more flexibility - in principle
> there's some kind of 'unsafe native segment builder' lurking in here
> which lets you specify:
>
> * whether to confine to a thread or not
> * what the size of the segment is
> * what is the base address of the segment
> * whether the resulting segment is closeable (and, if so, if a custom
> close() action should be provided)
>
> My sense is that clients typically will _not_ need all this flexibility.
> For instance, in the native interop case there are only two cases which
> seem overwhelmingly common:
>
> * I have an unchecked address and I want to give it a size - but I don't
> want closeability, or confinement - just let me dereference it within
> some known bounds
> * I have an unchecked address which I know comes from some 'malloc'
> call, and I want to attach it a full blown segment, and I want the
> segment::close operation to call free()
>
> I guess time will tell whether we need N ad-hoc unsafe factories, or a
> more flexible builder-based solution.
>
> At this point I'd be very interested on what your requirements would be
> for the segment you create with this unsafe API. My educated guess would
> be that:
>
> * you'd like this segment to have a known size
> * you'd like this segment to be closeable, and, upon close() some
> well-known native function in your allocator should be invoked
> * you'd probably like this segment not to be confined
>
> Is my guess correct?
>
> Cheers
> Maurizio
>
> On 31/03/2020 09:06, Antoine Chambille wrote:
> > Hi everyone,
> >
> > At ActiveViam we are watching the foreign memory project with eager
> > anticipation. Thank you for the hard work, looking forward to it!
> >
> > One question related to our usage of off-heap memory:
> >
> > If some native memory already exists, what is the preferred way to expose
> > it as a memory segment?
> >
> >
> > Some details about our use case: we make an in-memory database that
> > delivers interactive queries to many users on terabyte datasets. The
> > database structures are allocated off-heap, but not with malloc which is
> a
> > bottleneck. We developed a highly concurrent, NUMA-Aware SLAB allocator.
> > This custom memory manager is written almost entirely in Java with just a
> > few system calls (anonymous mmap, munmap, madvise).
> >
> > Cheers,
> > -Antoine
>


More information about the panama-dev mailing list