Accessing foreign memory that already exists

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Mar 31 23:42:27 UTC 2020


On 31/03/2020 23:57, Antoine Chambille wrote:
> Yes exactly. We can handle the in-memory database use case well with a 
> public API that creates "unsafe", unconfined memory segments. That 
> jdk.incubator.foreign.Foreign interface would work very well, I hope 
> it makes it to the final version!

I think it should make it to the final version, if not in this form, in 
some similar form, with either more methods, or less. The idea is to 
separate the 'safe' parts of the API from the 'unsafe' ones which are 
required for non-standard use cases.

Also note that, while all methods in Foreign are part of the foreign 
API, code that wants to call them needs to 'opt in' for restricted 
foreign access. At this point in time, this is done by setting a runtime 
property (-Djdk.incubator.foreign.Foreign=permit) - in the future this 
could become a more integrated option which gives special privileges to 
selected modules in the module path.

Maurizio

>
> Thanks
> -Antoine
>
>
>
> On Tue, Mar 31, 2020 at 11:28 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>     Thanks for the quick feedback.
>
>     While memory segments are, internally, quite configurable, and we
>     are trying to decide which bits of configuration to expose.
>
>     We will surely expose an unsafe way to make unconfined segment -
>     that's actually already in:
>
>     https://github.com/openjdk/panama-foreign/blob/cef1b7f746df24dea1c57470ed51403647ebb893/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/Foreign.java#L114
>
>     We also plan to have some primitive to take an address that has no
>     segment and give it a bounded segment (I have a patch for that in
>     the works).
>
>     If possible, I'd like to stop there - from what you say, it seems
>     like the two ingredients above should be enough for your need - as
>     you only really need the memory access API to replace Unsafe
>     access, not to manage the lifecycle of the memory you allocate (I
>     suppose you have your own abstractions for that).
>
>     Is my understanding correct?
>
>     P.S. note that even confined segments implement the Spliterator
>     interface as per recent changes, so they are amenable to parallel
>     processing (e.g. ForkJoin task).
>
>     Maurizio
>
>     On 31/03/2020 21:58, Antoine Chambille wrote:
>
>>     Hi Maurizio,
>>
>>     Thank you for the explanations and for your interest.
>>
>>     In short, for the use case of an in-memory analytical database
>>     you need a specialized memory allocator to manage tables and
>>     indexes, and direct read/write access to the data, in pure Java,
>>     from many concurrent threads. It's ok if the memory segments are
>>     just façades to the memory and don't actually manage it.
>>
>>     To answer you questions directly:
>>     * you'd like this segment to have a known size
>>       -> that would be handy, the segment could be used directly
>>     without the need of a parent structure to hold the size.
>>     * you'd like this segment to be closeable, and, upon close() some
>>     well-known native function in your allocator should be invoked
>>       -> indeed that would be the right place to have a "cleaner".
>>     not mandatory though, it can be done externally.
>>     * you'd probably like this segment not to be confined
>>       -> absolutely! we need massively parallel data access for data
>>     loading (mount large datasets on demand in the cloud for short
>>     lived sessions) and for aggregations (interactive query times
>>     even on terabytes).
>>
>>
>>
>>     In a bit more detail:
>>     Modern analytical databases are based on column stores, including
>>     the one we develop at ActiveViam that is called ActivePivot. The
>>     data is stored in binary columns, with a few indexing structures
>>     derived from hash tables and bitmap indexes. Those data
>>     structures are essentially made of big, long-lived arrays. To
>>     support very large datasets we allocate them off-heap, and we use
>>     the Java heap for aggregations and calculations.
>>
>>     Currently in ActivePivot the off-heap memory is managed by a SLAB
>>     allocator, based on mmap, that supports highly concurrent
>>     allocations and deallocations. It's also NUMA aware, so that
>>     during aggregations Java threads process the data partitions on
>>     the same NUMA node. Java threads read and write the data using
>>     sun.misc.Unsafe. The data access performance is good and
>>     predictable, there are no boundary checks. But optimizations such
>>     as loop unrolling and vectorization that work on java arrays are
>>     lost with Unsafe. And in many cases (column scans, joins,
>>     aggregations) we could use the panama Vector API that we also
>>     anticipate eagerly, and that would not work with Unsafe. For
>>     those reasons, we would like to return to the ranks and rebase
>>     our data access code on memory segments.
>>
>>     Thanks,
>>     -Antoine
>>
>>
>>
>>
>>
>>     On Tue, Mar 31, 2020 at 12:36 PM Maurizio Cimadamore
>>     <maurizio.cimadamore at oracle.com
>>     <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>         Hi Antoine,
>>         this is an interesting use case, and one I've been thinking
>>         quite a bit
>>         recently, as it comes up with native interop (see below).
>>
>>         In general there are two categories of memory addresses:
>>         checked ones
>>         (the ones with a known segment attached to them) and
>>         unchecked ones (the
>>         ones with no segment attached to them, or the ones that have
>>         the special
>>         Nothing segment attached to them).
>>
>>         Our policy is that addresses that are not backed by a segment
>>         _cannot_
>>         be de-referenced. This is how we've been achieving safety for
>>         the basic
>>         foreign memory access use case that doesn't do native
>>         interop. (we're
>>         discussing as to whether that's the right default, based on
>>         some library
>>         porting activity we've been doing recently - but there
>>         doesn't seem
>>         clear evidence pointing one way or another).
>>
>>         But there are cases where you might want to take an existing
>>         address,
>>         which is backed by no existing segment, and attach a segment
>>         to it -
>>         which will make it fully functional again - this operation is
>>         called
>>         'rebasing an address':
>>
>>         https://github.com/openjdk/panama-foreign/blob/foreign-abi/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemoryAddress.java#L83
>>
>>         So, with all this in mind, the goal to do what you want is to
>>         be able to
>>         (unsafely!) create a memory segment which has roughly the
>>         characteristics you need - e.g. given base address and given
>>         size. The
>>         native interop branch has a useful method for making these
>>         unchecked
>>         segments:
>>
>>         https://github.com/openjdk/panama-foreign/blob/foreign-abi/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/Foreign.java#L100
>>
>>         In other words, let's say you have a long address "addr" and
>>         that you
>>         want to create a segment around it:
>>
>>         1) create a memory address out of "addr"
>>
>>         var base = MemoryAddress.ofLong(addr)
>>
>>         2) create an unchecked segment with right base address and size
>>
>>         var segment = Foreign.ofNativeUnchecked(base, size)
>>
>>         And voila, you now have a segment for your non-Java generated
>>         address.
>>
>>         Few notes:
>>
>>         * since the address has been generated by you, when you close
>>         this
>>         segment, the memory access API won't attempt to do anything
>>         fancy here
>>         (but it will make all the addresses based on that segment
>>         invalid);
>>         options we have discussed here is to add ways to attach
>>         custom 'cleanup'
>>         functions - I'm a bit skeptical of those, but I can be
>>         convinced given
>>         the right use cases
>>
>>         * the segment will be confined on the calling thread -
>>         meaning that it
>>         can only be accessed and closed by that thread (as a regular
>>         segment)
>>
>>         I think here we can do things to allow more flexibility - in
>>         principle
>>         there's some kind of 'unsafe native segment builder' lurking
>>         in here
>>         which lets you specify:
>>
>>         * whether to confine to a thread or not
>>         * what the size of the segment is
>>         * what is the base address of the segment
>>         * whether the resulting segment is closeable (and, if so, if
>>         a custom
>>         close() action should be provided)
>>
>>         My sense is that clients typically will _not_ need all this
>>         flexibility.
>>         For instance, in the native interop case there are only two
>>         cases which
>>         seem overwhelmingly common:
>>
>>         * I have an unchecked address and I want to give it a size -
>>         but I don't
>>         want closeability, or confinement - just let me dereference
>>         it within
>>         some known bounds
>>         * I have an unchecked address which I know comes from some
>>         'malloc'
>>         call, and I want to attach it a full blown segment, and I
>>         want the
>>         segment::close operation to call free()
>>
>>         I guess time will tell whether we need N ad-hoc unsafe
>>         factories, or a
>>         more flexible builder-based solution.
>>
>>         At this point I'd be very interested on what your
>>         requirements would be
>>         for the segment you create with this unsafe API. My educated
>>         guess would
>>         be that:
>>
>>         * you'd like this segment to have a known size
>>         * you'd like this segment to be closeable, and, upon close()
>>         some
>>         well-known native function in your allocator should be invoked
>>         * you'd probably like this segment not to be confined
>>
>>         Is my guess correct?
>>
>>         Cheers
>>         Maurizio
>>
>>         On 31/03/2020 09:06, Antoine Chambille wrote:
>>         > Hi everyone,
>>         >
>>         > At ActiveViam we are watching the foreign memory project
>>         with eager
>>         > anticipation. Thank you for the hard work, looking forward
>>         to it!
>>         >
>>         > One question related to our usage of off-heap memory:
>>         >
>>         > If some native memory already exists, what is the preferred
>>         way to expose
>>         > it as a memory segment?
>>         >
>>         >
>>         > Some details about our use case: we make an in-memory
>>         database that
>>         > delivers interactive queries to many users on terabyte
>>         datasets. The
>>         > database structures are allocated off-heap, but not with
>>         malloc which is a
>>         > bottleneck. We developed a highly concurrent, NUMA-Aware
>>         SLAB allocator.
>>         > This custom memory manager is written almost entirely in
>>         Java with just a
>>         > few system calls (anonymous mmap, munmap, madvise).
>>         >
>>         > Cheers,
>>         > -Antoine
>>
>>
>
>
> -- 
> 	ActiveViam <https://www.activeviam.com> 	LinkedIn 
> <https://www.linkedin.com/company/activeviam>
> 	
> Antoine Chambille
> *Global Head of Research & Development *
> 	
>
> Office 	+33 (0)1 40 13 91 00 		
>
> YouTube <https://www.youtube.com/user/QuartetFS/videos>
> Blog <https://www.activeviam.com/blog/>
> Twitter <https://twitter.com/active_viam>
> location 
> <https://maps.google.com/?q=46+rue+de+l+Arbre+Sec,+75001+Paris,+France>  46 
> rue de l'Arbre Sec, 75001 Paris 	url 
> <https://www.activeviam.com>  visit our website 	
>


More information about the panama-dev mailing list