[foreign-memaccess] musing on the memory access API

Mon Jan 4 16:11:37 UTC 2021

Hi,
now that the foreign memory access API has been around for an year, I 
think it’s time we start asking ourselves if this is the API we want, 
and how comfortable we are in finalizing it. Overall, I think that there 
are some aspects of the memory access API which are definitively a success:

  *

    memory layouts, and the way they connect with dereference var
    handles is definitively a success story, and now that we have added
    even more var handle combinators, it is really possible to get crazy
    with expressing exotic memory access

  *

    the new shape of memory access var handle as (MemorySegment,
    long)->X makes a lot of sense, and it allowed us to greatly simplify
    and unify the implementation (as well as to give users a cheap way
    to do unsafe dereference of random addresses, which they sometimes want)

  *

    the distinction between MemorySegment and MemoryAddress is largely
    beneficial - and, when explained, it’s pretty obvious where the
    difference come from: to do dereference we need to attach bounds (of
    various kinds) to a raw pointer - after we do that, dereference
    operations are safe. I think this model makes it very natural to
    think about which places in your program might introduce invalid
    assumptions, especially when dealing with native code

I also think that there are aspects of the API where it’s less clear we 
made the right call:

  *

    slicing behavior: closing the slice closes everything. This was
    mostly a forced move: there are basically two use cases for slices:
    sometimes you slice soon after creation (e.g. to align), in which
    case you want the new slice to have same properties as the old one
    (e.g. deallocate on close). There are other cases where you are just
    creating a dumb sub-view, and you don’t really want to expose
    close() on those. This led to the creation of the “access modes”
    mechanism: each segment has some access modes - if the client wants
    to prevent calls to MemorySegment::close it can do so by
    /restricting/ the segment, and removing the corresponding CLOSE
    access mode (e.g. before the segment is shared with other clients).
    While this allows us to express all the use cases we care about, it
    seems also a tad convoluted. Moreover, the client wrapping a
    MemorySegment inside a TWR is always unsure as to whether the
    segment will support close() or not.

  *

    not all segments are created equal: some memory segments are just
    dumb views over memory that has been allocated somewhere else - e.g.
    a Java heap array or a byte buffer. In such cases, it seems odd to
    feature a close() operation (and I might add even having
    thread-confinement, given the original API did not feature that to
    begin with).

Sidebar: on numerous occasions it has been suggested to solve issues 
such as the one above by allowing close() to be a no-op in certain 
cases. While that is doable, I’ve never been too conviced about it, 
mainly because of this:

|MemorySegment s = ... s.close(); assertFalse(s.isAlive()); // I expect 
this to never fail!!!! |

In other words, a world where some segments are stateful and respond 
accordingly to close() requests and some are not seems very confusing to me.

  * the various operations for managing confinement of segments is
    rapidly turning into an distraction. For instance, recently, the
    Netty guys have created a port on top of the memory access API,
    since we have added support for shared segment. Their use of shared
    segment was a bit strange, in the sense that, while they allocated a
    segment in shared mode, they wanted to be able to confine the
    segment near where the segment is used, to catch potential mistakes.
    To do so, they resorted to calling handoff on a shared segment
    repeatedly, which performance-wise doesn’t work. Closing a shared
    segment (even if just for handing it off to some other thread) is a
    very expensive operation which needs to be used carefully - but the
    Netty developers were not aware of the trade-off (despite it being
    described in the javadoc - but that’s understandable, as it’s pretty
    subtle). Of course, if they just worked with a shared segment, and
    avoided handoff, things would have worked just fine (closing shared
    segments is perfectly fine for long lived segments). In other words,
    this is a case where, by featuring many different modes of
    interacting with segments (confined, shared) as well as ways to go
    back and forth between these states, we create extra complexity,
    both for ourselves and for the user.

I’ve been thinking quite a bit about these issues, trying to find a more 
stable position in the design space. While I can’t claim to have found a 
100% solution, I think I might be onto something worth exploring. On a 
recent re-read of the C# Span API doc [1], it dawned on me that there is 
a sibling abstraction to the Span abstraction in C#, namely Memory [2]. 
While some of the reasons behind the Span vs. Memory split have to do 
with stack vs. heap allocation (e.g. Span can only be used for local 
vars, not fields), and so not directly related to our design choices, I 
think some of the concepts of the C# solution hinted at a possibly 
better way to stack the problem of memory access.

We have known at least for the last 6 months that a MemorySegment is 
playing multiple roles at once: a MS is both a memory allocation (e.g. 
result of a malloc, or mmap), and a /view/ over said memory. This 
duplicity creates most of the problem listed above, as it’s clear that, 
while close() is a method that should belong to an allocation 
abstraction, it is less clear that close() should also belong to a 
view-like abstraction. We have tried, in the past, to come up with a 
3-pronged design, where we had not only MemorySegment and MemoryAddress, 
but also a MemoryResource abstraction from which /all/ segments were 
derived. These experiments have failed, pretty much all for the same 
reason: the return on complexity seemed thin.

Recently, I found myself going back slightly to that approach, although 
in a quite different way. Here’s the basic idea I’m playing with:

  * introduce a new abstraction: AllocationHandle (name TBD) - this
    wraps an allocation, whether generated by malloc, mmap, or some
    future allocator TBD (Jim’s QBA?)
  * We provide many AllocationHandle factories: { confined, shared } x {
    cleaner, no cleaner }
  * AllocationHandle is thin: just has a way to get size, alignment and
    a method to release memory - e.g. close(); in other words,
    AllocationHandle <: AutoCloseable
  * crucially, an AllocationHandle has a way to obtain a segment /view/
    out of it (MemorySegment)
  * a MemorySegment is the same thing it used to be, /minus/ the
    terminal operations (|close|, |handoff|, … methods)
  * we still keep all the factories for constructing MemorySegments out
    of heap arrays and byte buffer
  * there’s no way to go from a MemorySegment back to an AllocationHandle

This approach solves quite few issues:

  * Since MemorySegment does not have a close() method, we don’t have to
    worry about specifying what close() does in problematic cases
    (slices, on-heap, etc.)
  * There is an asymmetry between the actor which does an allocation
    (the holder of the AllocationHandle) and the rest of the world,
    which just deals with (non-closeable) MemorySegment - this seems to
    reflect how memory is allocated in the real world (one actor
    allocates, then shares a pointer to allocated memory to some other
    actors)
  * AllocationHandles come in many shapes and form, but instead of
    having dynamic state transitions, users will have to choose the
    flavor they like ahead of time, knowing pros and cons of each
  * This approach removes the need for access modes and restricted views
    - we probably still need a readOnly property in segments to support
    mapped memory, but that’s pretty much it

Of course there are also things that can be perceived as disadvantages:

  * Conciseness. Code dealing in native memory segments will have to
    first obtain an allocation handle, then obtaining a segment. For
    instance, code like this:

|try (MemorySegment s = MemorySegment.allocateNative(layout)) { ... 
MemoryAccess.getIntAtOffset(s, 42); ... } |

Will become:

|try (AllocationHandle ah = 
AllocationHandle.allocateNativeConfined(layout)) { MemorySegment s = 
ah.asSegment(); ... MemoryAccess.getIntAtOffset(s, 42); ... } |

  *

    It would be no longer possible for the linker API to just allocate
    memory and return a segment based on that memory - since now the
    user cannot free that memory anymore (no close method in segments).
    We could solve this either by having the linker API return
    allocation handle or, better, by having the linker API accepting a
    NativeScope where allocation should occur (since that’s how clients
    are likely to interact with the API point anyway). In fact, we have
    already considered doing something similar in the past (doing a
    malloc for each struct returned by value is a performance killer in
    certain contexts).

  *

    At least in this form, we give up state transitions between confined
    and shared. Users will have to pick in which side of the world they
    want to play with and stick with it. For simple lexically scoped use
    cases, confined is fine and efficient - in more complex cases,
    shared might be unavoidable. While handing off an entire
    AllocationHandle is totally doable, doing so (e.g. killing an
    existing AH instance to return a new AH instance confined on a
    different thread) will also kill all segments derived from the
    original AH. So it’s not clear such an API would be very useful: to
    be able to do an handoff, clients will need to pass around an
    AllocationHandle, not a MemorySegment (like now). Note that adding
    handoff operation directly on MemorySegment, under this design, is
    not feasible: handoff is a terminal operation, so we would allow
    clients to do nonsensical things like:

 1. obtain a segment
 2. create two identical segments via slicing
 3. set the owner of the two segments to two different threads

For this reason, it makes sense to think about ownership as a property 
on the /allocation/, not on the /view/.

  * While the impact of these changes on client using memory access API
    directly is somewhat biggie (no TWR on heap/buffer segments, need to
    go through an AllocationHandle for native stuff), clients of
    extracted API are largely unchanged, thanks to the fact that most of
    such clients use NativeScope anyway to abstract over how segments
    are allocated.

Any thoughts? I think the first question is as to whether we’re ok with 
the loss in conciseness, and with the addition of a new (albeit very 
simple) abstraction.

[1] - https://docs.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0
[2] - 
https://docs.microsoft.com/en-us/dotnet/standard/memory-and-spans/memory-t-usage-guidelines