[foreign-memaccess] musing on the memory access API

Wed Jan 6 20:34:27 UTC 2021

Maurizio,

Re: AllocationHandles, MemorySegments, MemoryAddress ideas

I want to share with you our Memory Package (Writeup
<https://datasketches.apache.org/docs/Memory/MemoryPackage.html>, GitHub
<https://github.com/apache/datasketches-memory>) we started in May, 2017
(JDK8) where we developed a capability very similar to what you are
advocating.  This Memory project is very analogous to your MemorySegment
and our Handles implement something very similar to your AllocationHandles
idea but with some other capabilities.

This Memory Project was developed to support high-performance off-heap
capabilities to support our Apache DataSketches
<https://datasketches.apache.org> project.

Because we were limited to JDK8, we had to jump through a bunch of hoops to
accomplish what we did using Unsafe, and gaining access to other hidden
classes. Hopefully, what Panama is doing will eliminate the need to do this.

Rather than repeating everything here, it is best if you could read the
writeup and let me know what you think. I don't think the code is much use
to you, but perhaps some of the ideas might be.

Cheers,

Lee.

On Tue, Jan 5, 2021 at 3:59 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

>
> On 04/01/2021 23:53, Uwe Schindler wrote:
> > Hi Maurizio,
> >
> >> Thanks for the feedback Uwe, and for the bug reports. We'll do our best
> >> to address some of them quickly (the NPE and the error in
> >> Unmapper::address). As for adding an overload for mapping a segment from
> >> a FileChannel I'm totally open to it, but I think it's late-ish now to
> >> add API changes, since we are in stabilization.
> > Hi, this was only a suggestion to improve the whole thing. My idea is
> more to wait for this until a more close integration into the FileSystem
> API is done. The main issue we had was that we can only pass a path from
> the default file system provider (I have a workaround for that, so during
> our testsuite we "unwrap" all the layers on top). But correctly, the
> FileSystem implementation should provide the way how to get a MemorySegment
> from the FileChannel, the current cast to the internal class is ... hacky!
> I know why it is like that (preview and it's not part of java base, so the
> FileSystem interface in java.base can't return a MemorySegment). But when
> Panama graduates, the filesystem integration is a must!: FileChannel should
> be extended by one "default" method throwing UOE, only implemented by
> default provider: "MemorySegment FileChannel.mapSegment(long offset, long
> size, MapMode mode)"
> +1 - this has been raised in the past as well, and I agree that the
> issue is more at the FileSystem interface level - we can't really do
> much at the level of the segment API as things stand. I'm less convinced
> that this is a "must" - while it's a nice to have, and something we
> should defo get working in the future, I don't think that by blocking
> integration of Panama APIs because mapped segments do not work with
> custom file system will be the right choice.
> >
> >> Also, thanks for the thoughts on the API in general - I kind of expected
> >> (given our discussions) that shared segments were 90% of what you needed
> >> - and that you are not much interested in using confinement. I agree
> >> that, when working from that angle, the API looks mostly ok. But not all
> >> clients have same requirements and some would like to take advantage of
> >> confinement more - also, note that if we just drop support for confined
> >> segments (which is something we also thought about) and just offered
> >> shared access, _all_ clients will be stuck with a very slow close()
> >> operation.
> > Hi, yes, I agree. I just said: Switching between those modes is
> unlikely, but yet a confined default for long living segments is correct,
> shared for long living ones (this is also the usage pattern: something that
> ölives very long is very likely often also used by many threads, like a
> database file or some database off-heap cache). Allocated memory used in
> netty is of course often short-lived, but it is in most cases not really
> concurrently used (or you can avoid it).
> >
> > I'd give the user the option on constructing, but don't allow to change
> it later.
> >
> >> There are very different ways to use a memory segment; sometimes (as in
> >> your case) a memory segment is long-lived, and you don't care if closing
> >> it takes 1 us. But there are other cases where segments are created (and
> >> disposed) more frequently. To me, the interesting fact that emerged from
> >> the Netty experiment (thanks guys!) was that using handoff AND shared
> >> segment, while nice on paper it's not going to work performance-wise,
> >> because you need to do an expensive close at each hand-off. This might
> >> be rectified, for instance by making the API more complex, and have a
> >> state where a segment has no owner (e.g. so that instead of confined(A)
> >> -> shared -> confined(B) you do confined(A) -> detached -> confined(B)
> >> ), but the risk is that to add a lot of API complexity ("detached" is a
> >> brand new segment state in which the segment is not accessible, but
> >> where memory is not yet deallocated) for what might be perceived as a
> >> corner case.
> >> So, the big question here is - given that there are defo different modes
> >> to interact with this API (short lived vs. long lived segment), what API
> >> allows us to capture the use cases we want in the simplest way possible?
> >> While dynamic ownership changes look like a cool idea on paper, it also
> >> add complexity - so I think now it's the right time to ask ourself if we
> >> should scale back on that a bit and have a more "static" set of flavors
> >> to pick from (e.g. { confined, shared } x { explicit, cleaner }
> > I think, when "allocating" a segment (by reserving memory, mapping a
> file, supplying some external MemoryAddress and length), you should set
> confined or shared from the beginning, without a possibility to change it.
> This would indeed simplify many things. I got new benchmarks a minute ago
> from my Lucene colleagues: the current MemorySegmentAPI seems 40% slower
> than ByteBuffer  for some use cases, but equal of speed/faster for other
> use cases (I assume it is still long vs. int index/looping problems; a for
> loop using LONG is not as good optimized as a for loop with INT --
> correct?). But without diving too deep, it might also come from the fact
> that the memory segments *may* change their state, so hotspot is not able
> to do all optimizations.
>
> If you have for loops with long indices, then yes, this is not
> optimized, and unfortunately expected to be slow. Also, to counteract
> that, the impl has many optimization so that, if a segment size can be
> represented as an int, many of the long operations are eliminated and
> replaced with int operations (e.g. bound checks). But if you work with
> true big segments (which I suspect is the case for Lucene), most of
> these optimization would not kick in. Luckily a lot of progress has been
> made on the long vs. int problem, but the work is not finished - I hope
> it will by the time 17 ships, so that we can remove all the hacks we
> have from the impl. That said, if you have specific benchmarks to throw
> our way we'd be happy to look at them!
>
> In our benchmark we have not observed slowdown caused to memory segment
> changing their state (note that they don't really change their state - a
> new instance with new properties is returned).
>
> Thanks
> Maurizio
>
>
>