[foreign-memaccess] musing on the memory access API
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Jan 4 16:11:37 UTC 2021
Hi,
now that the foreign memory access API has been around for an year, I
think it’s time we start asking ourselves if this is the API we want,
and how comfortable we are in finalizing it. Overall, I think that there
are some aspects of the memory access API which are definitively a success:
*
memory layouts, and the way they connect with dereference var
handles is definitively a success story, and now that we have added
even more var handle combinators, it is really possible to get crazy
with expressing exotic memory access
*
the new shape of memory access var handle as (MemorySegment,
long)->X makes a lot of sense, and it allowed us to greatly simplify
and unify the implementation (as well as to give users a cheap way
to do unsafe dereference of random addresses, which they sometimes want)
*
the distinction between MemorySegment and MemoryAddress is largely
beneficial - and, when explained, it’s pretty obvious where the
difference come from: to do dereference we need to attach bounds (of
various kinds) to a raw pointer - after we do that, dereference
operations are safe. I think this model makes it very natural to
think about which places in your program might introduce invalid
assumptions, especially when dealing with native code
I also think that there are aspects of the API where it’s less clear we
made the right call:
*
slicing behavior: closing the slice closes everything. This was
mostly a forced move: there are basically two use cases for slices:
sometimes you slice soon after creation (e.g. to align), in which
case you want the new slice to have same properties as the old one
(e.g. deallocate on close). There are other cases where you are just
creating a dumb sub-view, and you don’t really want to expose
close() on those. This led to the creation of the “access modes”
mechanism: each segment has some access modes - if the client wants
to prevent calls to MemorySegment::close it can do so by
/restricting/ the segment, and removing the corresponding CLOSE
access mode (e.g. before the segment is shared with other clients).
While this allows us to express all the use cases we care about, it
seems also a tad convoluted. Moreover, the client wrapping a
MemorySegment inside a TWR is always unsure as to whether the
segment will support close() or not.
*
not all segments are created equal: some memory segments are just
dumb views over memory that has been allocated somewhere else - e.g.
a Java heap array or a byte buffer. In such cases, it seems odd to
feature a close() operation (and I might add even having
thread-confinement, given the original API did not feature that to
begin with).
Sidebar: on numerous occasions it has been suggested to solve issues
such as the one above by allowing close() to be a no-op in certain
cases. While that is doable, I’ve never been too conviced about it,
mainly because of this:
|MemorySegment s = ... s.close(); assertFalse(s.isAlive()); // I expect
this to never fail!!!! |
In other words, a world where some segments are stateful and respond
accordingly to close() requests and some are not seems very confusing to me.
* the various operations for managing confinement of segments is
rapidly turning into an distraction. For instance, recently, the
Netty guys have created a port on top of the memory access API,
since we have added support for shared segment. Their use of shared
segment was a bit strange, in the sense that, while they allocated a
segment in shared mode, they wanted to be able to confine the
segment near where the segment is used, to catch potential mistakes.
To do so, they resorted to calling handoff on a shared segment
repeatedly, which performance-wise doesn’t work. Closing a shared
segment (even if just for handing it off to some other thread) is a
very expensive operation which needs to be used carefully - but the
Netty developers were not aware of the trade-off (despite it being
described in the javadoc - but that’s understandable, as it’s pretty
subtle). Of course, if they just worked with a shared segment, and
avoided handoff, things would have worked just fine (closing shared
segments is perfectly fine for long lived segments). In other words,
this is a case where, by featuring many different modes of
interacting with segments (confined, shared) as well as ways to go
back and forth between these states, we create extra complexity,
both for ourselves and for the user.
I’ve been thinking quite a bit about these issues, trying to find a more
stable position in the design space. While I can’t claim to have found a
100% solution, I think I might be onto something worth exploring. On a
recent re-read of the C# Span API doc [1], it dawned on me that there is
a sibling abstraction to the Span abstraction in C#, namely Memory [2].
While some of the reasons behind the Span vs. Memory split have to do
with stack vs. heap allocation (e.g. Span can only be used for local
vars, not fields), and so not directly related to our design choices, I
think some of the concepts of the C# solution hinted at a possibly
better way to stack the problem of memory access.
We have known at least for the last 6 months that a MemorySegment is
playing multiple roles at once: a MS is both a memory allocation (e.g.
result of a malloc, or mmap), and a /view/ over said memory. This
duplicity creates most of the problem listed above, as it’s clear that,
while close() is a method that should belong to an allocation
abstraction, it is less clear that close() should also belong to a
view-like abstraction. We have tried, in the past, to come up with a
3-pronged design, where we had not only MemorySegment and MemoryAddress,
but also a MemoryResource abstraction from which /all/ segments were
derived. These experiments have failed, pretty much all for the same
reason: the return on complexity seemed thin.
Recently, I found myself going back slightly to that approach, although
in a quite different way. Here’s the basic idea I’m playing with:
* introduce a new abstraction: AllocationHandle (name TBD) - this
wraps an allocation, whether generated by malloc, mmap, or some
future allocator TBD (Jim’s QBA?)
* We provide many AllocationHandle factories: { confined, shared } x {
cleaner, no cleaner }
* AllocationHandle is thin: just has a way to get size, alignment and
a method to release memory - e.g. close(); in other words,
AllocationHandle <: AutoCloseable
* crucially, an AllocationHandle has a way to obtain a segment /view/
out of it (MemorySegment)
* a MemorySegment is the same thing it used to be, /minus/ the
terminal operations (|close|, |handoff|, … methods)
* we still keep all the factories for constructing MemorySegments out
of heap arrays and byte buffer
* there’s no way to go from a MemorySegment back to an AllocationHandle
This approach solves quite few issues:
* Since MemorySegment does not have a close() method, we don’t have to
worry about specifying what close() does in problematic cases
(slices, on-heap, etc.)
* There is an asymmetry between the actor which does an allocation
(the holder of the AllocationHandle) and the rest of the world,
which just deals with (non-closeable) MemorySegment - this seems to
reflect how memory is allocated in the real world (one actor
allocates, then shares a pointer to allocated memory to some other
actors)
* AllocationHandles come in many shapes and form, but instead of
having dynamic state transitions, users will have to choose the
flavor they like ahead of time, knowing pros and cons of each
* This approach removes the need for access modes and restricted views
- we probably still need a readOnly property in segments to support
mapped memory, but that’s pretty much it
Of course there are also things that can be perceived as disadvantages:
* Conciseness. Code dealing in native memory segments will have to
first obtain an allocation handle, then obtaining a segment. For
instance, code like this:
|try (MemorySegment s = MemorySegment.allocateNative(layout)) { ...
MemoryAccess.getIntAtOffset(s, 42); ... } |
Will become:
|try (AllocationHandle ah =
AllocationHandle.allocateNativeConfined(layout)) { MemorySegment s =
ah.asSegment(); ... MemoryAccess.getIntAtOffset(s, 42); ... } |
*
It would be no longer possible for the linker API to just allocate
memory and return a segment based on that memory - since now the
user cannot free that memory anymore (no close method in segments).
We could solve this either by having the linker API return
allocation handle or, better, by having the linker API accepting a
NativeScope where allocation should occur (since that’s how clients
are likely to interact with the API point anyway). In fact, we have
already considered doing something similar in the past (doing a
malloc for each struct returned by value is a performance killer in
certain contexts).
*
At least in this form, we give up state transitions between confined
and shared. Users will have to pick in which side of the world they
want to play with and stick with it. For simple lexically scoped use
cases, confined is fine and efficient - in more complex cases,
shared might be unavoidable. While handing off an entire
AllocationHandle is totally doable, doing so (e.g. killing an
existing AH instance to return a new AH instance confined on a
different thread) will also kill all segments derived from the
original AH. So it’s not clear such an API would be very useful: to
be able to do an handoff, clients will need to pass around an
AllocationHandle, not a MemorySegment (like now). Note that adding
handoff operation directly on MemorySegment, under this design, is
not feasible: handoff is a terminal operation, so we would allow
clients to do nonsensical things like:
1. obtain a segment
2. create two identical segments via slicing
3. set the owner of the two segments to two different threads
For this reason, it makes sense to think about ownership as a property
on the /allocation/, not on the /view/.
* While the impact of these changes on client using memory access API
directly is somewhat biggie (no TWR on heap/buffer segments, need to
go through an AllocationHandle for native stuff), clients of
extracted API are largely unchanged, thanks to the fact that most of
such clients use NativeScope anyway to abstract over how segments
are allocated.
Any thoughts? I think the first question is as to whether we’re ok with
the loss in conciseness, and with the addition of a new (albeit very
simple) abstraction.
[1] - https://docs.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0
[2] -
https://docs.microsoft.com/en-us/dotnet/standard/memory-and-spans/memory-t-usage-guidelines
More information about the panama-dev
mailing list