[foreign-memaccess] musing on the memory access API

Wed Jan 6 21:53:50 UTC 2021

Thanks Lee,
I'll defo look that up.

Cheers
Maurizio

On 06/01/2021 20:34, leerho wrote:
> Maurizio,
>
> Re: AllocationHandles, MemorySegments, MemoryAddress ideas
>
> I want to share with you our Memory Package (Writeup 
> <https://urldefense.com/v3/__https://datasketches.apache.org/docs/Memory/MemoryPackage.html__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTEu0SLgZ0$>, 
> GitHub 
> <https://urldefense.com/v3/__https://github.com/apache/datasketches-memory__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTE0zTdBxM$>) 
> we started in May, 2017 (JDK8) where we developed a capability very 
> similar to what you are advocating.  This Memory project is very 
> analogous to your MemorySegment and our Handles implement something 
> very similar to your AllocationHandles idea but with some other 
> capabilities.
>
> This Memory Project was developed to support high-performance off-heap 
> capabilities to support ourApache DataSketches 
> <https://urldefense.com/v3/__https://datasketches.apache.org__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTEyftrhZg$> 
> project.
>
> Because we were limited to JDK8, we had to jump through a bunch of 
> hoops to accomplish what we did using Unsafe, and gaining access to 
> other hidden classes. Hopefully, what Panama is doing will eliminate 
> the need to do this.
>
> Rather than repeating everything here, it is best if you could read 
> the writeup and let me know what you think. I don't think the code is 
> much use to you, but perhaps some of the ideas might be.
>
> Cheers,
>
> Lee.
>
> On Tue, Jan 5, 2021 at 3:59 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>
>     On 04/01/2021 23:53, Uwe Schindler wrote:
>     > Hi Maurizio,
>     >
>     >> Thanks for the feedback Uwe, and for the bug reports. We'll do
>     our best
>     >> to address some of them quickly (the NPE and the error in
>     >> Unmapper::address). As for adding an overload for mapping a
>     segment from
>     >> a FileChannel I'm totally open to it, but I think it's late-ish
>     now to
>     >> add API changes, since we are in stabilization.
>     > Hi, this was only a suggestion to improve the whole thing. My
>     idea is more to wait for this until a more close integration into
>     the FileSystem API is done. The main issue we had was that we can
>     only pass a path from the default file system provider (I have a
>     workaround for that, so during our testsuite we "unwrap" all the
>     layers on top). But correctly, the FileSystem implementation
>     should provide the way how to get a MemorySegment from the
>     FileChannel, the current cast to the internal class is ... hacky!
>     I know why it is like that (preview and it's not part of java
>     base, so the FileSystem interface in java.base can't return a
>     MemorySegment). But when Panama graduates, the filesystem
>     integration is a must!: FileChannel should be extended by one
>     "default" method throwing UOE, only implemented by default
>     provider: "MemorySegment FileChannel.mapSegment(long offset, long
>     size, MapMode mode)"
>     +1 - this has been raised in the past as well, and I agree that the
>     issue is more at the FileSystem interface level - we can't really do
>     much at the level of the segment API as things stand. I'm less
>     convinced
>     that this is a "must" - while it's a nice to have, and something we
>     should defo get working in the future, I don't think that by blocking
>     integration of Panama APIs because mapped segments do not work with
>     custom file system will be the right choice.
>     >
>     >> Also, thanks for the thoughts on the API in general - I kind of
>     expected
>     >> (given our discussions) that shared segments were 90% of what
>     you needed
>     >> - and that you are not much interested in using confinement. I
>     agree
>     >> that, when working from that angle, the API looks mostly ok.
>     But not all
>     >> clients have same requirements and some would like to take
>     advantage of
>     >> confinement more - also, note that if we just drop support for
>     confined
>     >> segments (which is something we also thought about) and just
>     offered
>     >> shared access, _all_ clients will be stuck with a very slow close()
>     >> operation.
>     > Hi, yes, I agree. I just said: Switching between those modes is
>     unlikely, but yet a confined default for long living segments is
>     correct, shared for long living ones (this is also the usage
>     pattern: something that ölives very long is very likely often also
>     used by many threads, like a database file or some database
>     off-heap cache). Allocated memory used in netty is of course often
>     short-lived, but it is in most cases not really concurrently used
>     (or you can avoid it).
>     >
>     > I'd give the user the option on constructing, but don't allow to
>     change it later.
>     >
>     >> There are very different ways to use a memory segment;
>     sometimes (as in
>     >> your case) a memory segment is long-lived, and you don't care
>     if closing
>     >> it takes 1 us. But there are other cases where segments are
>     created (and
>     >> disposed) more frequently. To me, the interesting fact that
>     emerged from
>     >> the Netty experiment (thanks guys!) was that using handoff AND
>     shared
>     >> segment, while nice on paper it's not going to work
>     performance-wise,
>     >> because you need to do an expensive close at each hand-off.
>     This might
>     >> be rectified, for instance by making the API more complex, and
>     have a
>     >> state where a segment has no owner (e.g. so that instead of
>     confined(A)
>     >> -> shared -> confined(B) you do confined(A) -> detached ->
>     confined(B)
>     >> ), but the risk is that to add a lot of API complexity
>     ("detached" is a
>     >> brand new segment state in which the segment is not accessible, but
>     >> where memory is not yet deallocated) for what might be
>     perceived as a
>     >> corner case.
>     >> So, the big question here is - given that there are defo
>     different modes
>     >> to interact with this API (short lived vs. long lived segment),
>     what API
>     >> allows us to capture the use cases we want in the simplest way
>     possible?
>     >> While dynamic ownership changes look like a cool idea on paper,
>     it also
>     >> add complexity - so I think now it's the right time to ask
>     ourself if we
>     >> should scale back on that a bit and have a more "static" set of
>     flavors
>     >> to pick from (e.g. { confined, shared } x { explicit, cleaner }
>     > I think, when "allocating" a segment (by reserving memory,
>     mapping a file, supplying some external MemoryAddress and length),
>     you should set confined or shared from the beginning, without a
>     possibility to change it. This would indeed simplify many things.
>     I got new benchmarks a minute ago from my Lucene colleagues: the
>     current MemorySegmentAPI seems 40% slower than ByteBuffer  for
>     some use cases, but equal of speed/faster for other use cases (I
>     assume it is still long vs. int index/looping problems; a for loop
>     using LONG is not as good optimized as a for loop with INT --
>     correct?). But without diving too deep, it might also come from
>     the fact that the memory segments *may* change their state, so
>     hotspot is not able to do all optimizations.
>
>     If you have for loops with long indices, then yes, this is not
>     optimized, and unfortunately expected to be slow. Also, to counteract
>     that, the impl has many optimization so that, if a segment size
>     can be
>     represented as an int, many of the long operations are eliminated and
>     replaced with int operations (e.g. bound checks). But if you work
>     with
>     true big segments (which I suspect is the case for Lucene), most of
>     these optimization would not kick in. Luckily a lot of progress
>     has been
>     made on the long vs. int problem, but the work is not finished - I
>     hope
>     it will by the time 17 ships, so that we can remove all the hacks we
>     have from the impl. That said, if you have specific benchmarks to
>     throw
>     our way we'd be happy to look at them!
>
>     In our benchmark we have not observed slowdown caused to memory
>     segment
>     changing their state (note that they don't really change their
>     state - a
>     new instance with new properties is returned).
>
>     Thanks
>     Maurizio
>
>