[foreign-memaccess] musing on the memory access API
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 7 11:07:43 UTC 2021
Overall very nice read - as you noted there are indeed many overlaps
between your memory package and the memory access API.
Few notes:
* Your use of handles is indeed identical to what I proposed in my
email, thanks for pointing that out
* You attach operations such as force/load/isLoaded in the handle for
mapped memory, which is something that occurred to me as well (in our
API, these operations are currently statics on a separate class)
* At the beginning, the doc claims protection from use after free even
in concurrent use - looking at the code that doesn't seem to be the case
though? E.g. it's true that updates to the "valid" bit of the memory
state are atomic, but that doesn't rule out the possibility of multiple
threads seeing a "true" value, then being interleaved with a memory
released, which would ultimately result in access free? I the Java 16
iteration of the API we address this problem too, but at a much lower
level (we needed some VM/GC black magic to pull this off).
* The main differences between the memory access API and your API seem
to be in how dereference is done - you opted for virtual methods, while
we go all in on var handles (and then we provide a bunch of static
accessors on the side). I think the two are similar, although I think
I'm happy where we landed with our API, since using the pre-baked
statics is not any harder than using an instance method, but in exchange
we get a lot of capabilities of out the var handle API (such as atomic
access and adaptation). This decision has repercussions on the API, of
course: the fact that we use MemorySegment as a VarHandle coordinate
means we cannot get too crazy with hierarchies on the MemorySegment
front - in fact, when we tried to do that (at some point we had
MappedMemorySegment <: MemorySegment) we ran into performance issues, as
memory access var handle need exact type information to be fast.
* I believe/hope that the main gripes you had with the byte buffer API
(which seem to be endianness related) are gone with the memory access
API. There we made the decision of leaving endianness outside of the
MemorySegment - e.g. endianness is a property of the VarHandle doing the
access, not a property of the segment per se. I believe this decision
paid off (e.g. our segments are completely orthogonal w.r.t. layout
decisions), and avoids a lot of confusion as to "what's the default" etc.
* Because of the above (segments and layouts are orthogonal) it also
means that you can view the same segments in different ways, by
overlaying different layouts on top. This is especially evident in the
spliterator method:
https://download.java.net/java/early_access/jdk16/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemorySegment.html#spliterator(jdk.incubator.foreign.SequenceLayout)
This takes an arbitrary sequence layout and then splits the segment
according to the provided layout boundaries.
Overall, it seems to me that the memory access API should (hopefully) a
good fit for your use case - we'd be interested to know any benchmark
comparison you might have, in case you decide to try our API out.
Thanks again for the pointers, very interesting stuff!
Cheers
Maurizio
On 06/01/2021 21:53, Maurizio Cimadamore wrote:
> Thanks Lee,
> I'll defo look that up.
>
> Cheers
> Maurizio
>
> On 06/01/2021 20:34, leerho wrote:
>> Maurizio,
>>
>> Re: AllocationHandles, MemorySegments, MemoryAddress ideas
>>
>> I want to share with you our Memory Package (Writeup
>> <https://urldefense.com/v3/__https://datasketches.apache.org/docs/Memory/MemoryPackage.html__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTEu0SLgZ0$>,
>> GitHub
>> <https://urldefense.com/v3/__https://github.com/apache/datasketches-memory__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTE0zTdBxM$>)
>> we started in May, 2017 (JDK8) where we developed a capability very
>> similar to what you are advocating. This Memory project is very
>> analogous to your MemorySegment and our Handles implement something
>> very similar to your AllocationHandles idea but with some other
>> capabilities.
>>
>> This Memory Project was developed to support high-performance
>> off-heap capabilities to support ourApache DataSketches
>> <https://urldefense.com/v3/__https://datasketches.apache.org__;!!GqivPVa7Brio!PTnmde5UeQ8A0bViwbtQEAfj8LpQQsWTEw2ZCLKxZmVahh8gXP-EF8SXumrhazTEyftrhZg$>
>> project.
>>
>> Because we were limited to JDK8, we had to jump through a bunch of
>> hoops to accomplish what we did using Unsafe, and gaining access to
>> other hidden classes. Hopefully, what Panama is doing will eliminate
>> the need to do this.
>>
>> Rather than repeating everything here, it is best if you could read
>> the writeup and let me know what you think. I don't think the code is
>> much use to you, but perhaps some of the ideas might be.
>>
>> Cheers,
>>
>> Lee.
>>
>> On Tue, Jan 5, 2021 at 3:59 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>
>> On 04/01/2021 23:53, Uwe Schindler wrote:
>> > Hi Maurizio,
>> >
>> >> Thanks for the feedback Uwe, and for the bug reports. We'll do
>> our best
>> >> to address some of them quickly (the NPE and the error in
>> >> Unmapper::address). As for adding an overload for mapping a
>> segment from
>> >> a FileChannel I'm totally open to it, but I think it's late-ish
>> now to
>> >> add API changes, since we are in stabilization.
>> > Hi, this was only a suggestion to improve the whole thing. My
>> idea is more to wait for this until a more close integration into
>> the FileSystem API is done. The main issue we had was that we can
>> only pass a path from the default file system provider (I have a
>> workaround for that, so during our testsuite we "unwrap" all the
>> layers on top). But correctly, the FileSystem implementation
>> should provide the way how to get a MemorySegment from the
>> FileChannel, the current cast to the internal class is ... hacky!
>> I know why it is like that (preview and it's not part of java
>> base, so the FileSystem interface in java.base can't return a
>> MemorySegment). But when Panama graduates, the filesystem
>> integration is a must!: FileChannel should be extended by one
>> "default" method throwing UOE, only implemented by default
>> provider: "MemorySegment FileChannel.mapSegment(long offset, long
>> size, MapMode mode)"
>> +1 - this has been raised in the past as well, and I agree that the
>> issue is more at the FileSystem interface level - we can't really do
>> much at the level of the segment API as things stand. I'm less
>> convinced
>> that this is a "must" - while it's a nice to have, and something we
>> should defo get working in the future, I don't think that by
>> blocking
>> integration of Panama APIs because mapped segments do not work with
>> custom file system will be the right choice.
>> >
>> >> Also, thanks for the thoughts on the API in general - I kind of
>> expected
>> >> (given our discussions) that shared segments were 90% of what
>> you needed
>> >> - and that you are not much interested in using confinement. I
>> agree
>> >> that, when working from that angle, the API looks mostly ok.
>> But not all
>> >> clients have same requirements and some would like to take
>> advantage of
>> >> confinement more - also, note that if we just drop support for
>> confined
>> >> segments (which is something we also thought about) and just
>> offered
>> >> shared access, _all_ clients will be stuck with a very slow
>> close()
>> >> operation.
>> > Hi, yes, I agree. I just said: Switching between those modes is
>> unlikely, but yet a confined default for long living segments is
>> correct, shared for long living ones (this is also the usage
>> pattern: something that ölives very long is very likely often also
>> used by many threads, like a database file or some database
>> off-heap cache). Allocated memory used in netty is of course often
>> short-lived, but it is in most cases not really concurrently used
>> (or you can avoid it).
>> >
>> > I'd give the user the option on constructing, but don't allow to
>> change it later.
>> >
>> >> There are very different ways to use a memory segment;
>> sometimes (as in
>> >> your case) a memory segment is long-lived, and you don't care
>> if closing
>> >> it takes 1 us. But there are other cases where segments are
>> created (and
>> >> disposed) more frequently. To me, the interesting fact that
>> emerged from
>> >> the Netty experiment (thanks guys!) was that using handoff AND
>> shared
>> >> segment, while nice on paper it's not going to work
>> performance-wise,
>> >> because you need to do an expensive close at each hand-off.
>> This might
>> >> be rectified, for instance by making the API more complex, and
>> have a
>> >> state where a segment has no owner (e.g. so that instead of
>> confined(A)
>> >> -> shared -> confined(B) you do confined(A) -> detached ->
>> confined(B)
>> >> ), but the risk is that to add a lot of API complexity
>> ("detached" is a
>> >> brand new segment state in which the segment is not
>> accessible, but
>> >> where memory is not yet deallocated) for what might be
>> perceived as a
>> >> corner case.
>> >> So, the big question here is - given that there are defo
>> different modes
>> >> to interact with this API (short lived vs. long lived segment),
>> what API
>> >> allows us to capture the use cases we want in the simplest way
>> possible?
>> >> While dynamic ownership changes look like a cool idea on paper,
>> it also
>> >> add complexity - so I think now it's the right time to ask
>> ourself if we
>> >> should scale back on that a bit and have a more "static" set of
>> flavors
>> >> to pick from (e.g. { confined, shared } x { explicit, cleaner }
>> > I think, when "allocating" a segment (by reserving memory,
>> mapping a file, supplying some external MemoryAddress and length),
>> you should set confined or shared from the beginning, without a
>> possibility to change it. This would indeed simplify many things.
>> I got new benchmarks a minute ago from my Lucene colleagues: the
>> current MemorySegmentAPI seems 40% slower than ByteBuffer for
>> some use cases, but equal of speed/faster for other use cases (I
>> assume it is still long vs. int index/looping problems; a for loop
>> using LONG is not as good optimized as a for loop with INT --
>> correct?). But without diving too deep, it might also come from
>> the fact that the memory segments *may* change their state, so
>> hotspot is not able to do all optimizations.
>>
>> If you have for loops with long indices, then yes, this is not
>> optimized, and unfortunately expected to be slow. Also, to
>> counteract
>> that, the impl has many optimization so that, if a segment size
>> can be
>> represented as an int, many of the long operations are eliminated
>> and
>> replaced with int operations (e.g. bound checks). But if you work
>> with
>> true big segments (which I suspect is the case for Lucene), most of
>> these optimization would not kick in. Luckily a lot of progress
>> has been
>> made on the long vs. int problem, but the work is not finished - I
>> hope
>> it will by the time 17 ships, so that we can remove all the hacks we
>> have from the impl. That said, if you have specific benchmarks to
>> throw
>> our way we'd be happy to look at them!
>>
>> In our benchmark we have not observed slowdown caused to memory
>> segment
>> changing their state (note that they don't really change their
>> state - a
>> new instance with new properties is returned).
>>
>> Thanks
>> Maurizio
>>
>>
More information about the panama-dev
mailing list