memory access - pulling all the threads
Ty Young
youngty1997 at gmail.com
Tue Jan 26 07:36:34 UTC 2021
The basic idea behind a NativeAllocator makes sense but Is keeping the
current MemorySegment.close() and access modes out of the question?
Would it not be possible to introduce a free(MemorySegment) method to
this NativeAllocator interface, which MemorySegment.close() calls, so
that the MemorySegment abstraction may be marked as **not** alive but
the underlying memory may still be alive?
On 1/25/21 11:52 AM, Maurizio Cimadamore wrote:
> Hi,
> as you know, I've been looking at both internal and external feedback
> on usage of the memory access API, in an attempt to understand what
> the problem with the API are, and how to move forward. As discussed
> here [1], there are some things which work well, such as structured
> access, or recent addition to shared segment support (the latter seem
> to have enabled a wide variety of experiments which allowed us to
> gather more feedback - thanks!). But there are still some issues to be
> resolved - which could be summarized as "the MemorySegment abstraction
> is trying to do too many things at once" (again, please refer to [1]
> for a more detailed description of the problem involved).
>
> In [1] I described a possible approach where every allocation method
> (MemorySegment::allocateNative and MemorySegment::mapFile) return a
> "allocation handle", not a segment directly. The handle is the
> closeable entity, while the segment is just a view. While this
> approach is workable (and something very similar has indeed been
> explored here [2]), after implementing some parts of it, I was left
> not satisfied with how this approach integrates with respect to the
> foreign linker support. For instance, defining the behavior of methods
> such as CLinker::toCString becomes quite convoluted: where does the
> allocation handle associated with the returned string comes from? If
> the segment has no pointer to the handle, how can the memory
> associated to the string be closed? What is the relationship between
> an allocation handle and a NativeScope? All these questions led me to
> conclude that the proposed approach was not enough, and that we needed
> to try harder.
>
> The above approach does one thing right: it splits memory segments
> from the entity managing allocation/closure of memory resources, thus
> turning memory segments into dumb views. But it doesn't go far enough
> in this direction; as it turns out, what we really want here is a way
> to capture the concept of the lifecycle that is associated to one or
> more (logically related) resources - which, unsurprisingly, is part of
> what NativeScope does too. So, let's try to model this abstraction:
>
> ```
> interface ResourceScope extends AutoCloseable {
> void addOnClose(Runnable) // adds a new cleanup action to this scope
> void close() // closes the scope
>
> static ResourceScope ofConfined() // creates a confined resource scope
> static ResourceScope ofShared() // creates a shared resource scope
> static ResourceScope ofConfined(Cleaner) // creates a confined
> resource scope - managed by cleaner
> static ResourceScope ofShared(Cleaner) // creates a shared resource
> scope - managed by cleaner
> }
> ```
>
> It's a very simple interface - you can basically add new cleanup
> actions to it, which will be called when the scope is closed; note
> that ResourceScope supports implicit close (via a Cleaner), or
> explicit close (via the close method) - it can even support both (not
> shown here).
>
> Armed with this new abstraction, let's try to see if we can shine new
> light onto some of the existing API methods and abstractions.
>
> Let's start with heap segments - these are allocated using one of the
> MemorySegment::ofArray() factories; one of the issues with heap
> segments is that it doesn't make much sense to close them. In the
> proposed approach, this can be handled nicely: heap segments are
> associated with a _global_ scope that cannot be closed - a scope that
> is _always alive_. This clarifies the role of heap segments (and also
> of buffer segments) nicely.
>
> Let's proceed to MemorySegment::allocateNative/mapFile - what should
> these factories do? Under the new proposal, these method should accept
> a ResourceScope parameter, which defines the lifecycle to which the
> newly created segment should be attached to. If we want to still
> provide ResourceScope-less overloads (as the API does now) we can pick
> a useful default: a shared, non-closeable, cleaner-backed scope. This
> choice gives us essentially the same semantics as a byte buffer, so it
> would be an ideal starting point for developers coming from the
> ByteBuffer API trying to familiarize with the new memory access API.
> Note that, when using these more compact factories, scopes are almost
> entirely hidden from the client - so no extra complexity is added
> (compared e.g. to the ByteBuffer API).
>
> As it turns out, ResourceScope is not only useful for segments, but it
> is also useful for a number of entities which need to be attached to
> some lifecycle, such as:
>
> * upcall stubs
> * va lists
> * loaded libraries
>
> The upcall stub case is particularly telling: in that case, we have
> decided to model an upcall stub as a MemorySegment not because it
> makes sense to dereference an upcall stub - but simply because we need
> to have a way to _release_ the upcall stub once we're done using it.
> Under the new proposal, we have a new, powerful option: the upcall
> stub API point can accept an user-provided ResourceScope which will be
> responsible for managing the lifecycle of the upcall stub entity. That
> is, we are now free to turn the result of a call to upcallStub to
> something other than a MemorySegment (e.g. a FunctionPointer?) w/o
> loss of functionality.
>
> Resource scopes are very useful to manage _group_ of resources - there
> are in fact cases where one or more segments share the same lifecycle
> - that is, they need to be all alive at the same time; to handle some
> of these use cases, the status quo adds the NativeScope abstraction,
> which can accept registration of external memory segment (via the
> MemorySegment::handoff) API. This use case is naturally handled by the
> ResourceScope API:
>
> ```
> try (ResourceScope scope : ResourceScope.ofConfined()) {
> MemorySegment.allocateNative(layout, scope):
> MemorySegment.mapFile(... , scope);
> CLinker.upcallStub(..., scope);
> } // release all resources
> ```
>
> Does this remove the need for NativeScope ? Not so fast: NativeScope
> is used to group logically related resources, yes, but is also used as
> a faster, arena-based allocator - which attempts to minimize the
> number of system calls (e.g. to malloc) by allocating bigger memory
> blocks and then handing over slices to clients. Let's try to model the
> allocation-nature of a NativeScope with a separate interface, as follows:
>
> ```
> @FunctionalInterface
> interface NativeAllocator {
> MemorySegment allocate(long size, long align);
> default allocateInt(MemoryLayout intLayout, int value) { ... }
> default allocateLong(MemoryLayout intLayout, long value) { ... }
> ... // all allocation helpers in NativeScope
> }
> ```
>
> At first, it seems this interface doesn't add much. But it is quite
> powerful - for instance, a client can create a simple, malloc-like
> allocator, as follows:
>
> ```
> NativeAllocator malloc = (size, align) ->
> MemorySegment.allocateNative(size, align, ResourceScope.ofConfined());
>
> ```
>
> This is an allocator which allocates a new region of memory on each
> allocation request, backed by a fresh confined scope (which can be
> closed independently). This idiom is in fact so common that the API
> allows client to create these allocators in a more compact fashion:
>
> ```
> NativeAllocator confinedMalloc =
> NativeAllocator.ofMalloc(ResourceScope::ofConfined);
> NativeAllocator sharedMalloc =
> NativeAllocator.ofMalloc(ResourceScope::ofConfined);
> ```
>
> But other strategies are also possible:
>
> * arena allocation (e.g. the allocation strategy currently used by
> NativeScope)
> * recycling allocation (a single segment, with given layout, is
> allocated, and allocation requests are served by repeatedly slicing
> that very segment) - this is a critical optimization in e.g. loops
> * interop with custom allocators
>
> So, where would we accept a NativeAllocator in our API? Turns out that
> accepting an allocator is handy whenever an API point needs to
> allocate some native memory - so, instead of
>
> ```
> MemorySegment toCString(String)
> ```
>
> This is better:
>
> ```
> MemorySegment toCString(String, NativeAllocator)
> ```
>
> Of course, we need to tweak the foreign linker, so that in all foreign
> calls returning a struct by value (which require some allocation), a
> NativeAllocator prefix argument is added to the method handle, so that
> the user can specify which allocator should be used by the call; this
> is a straightforward change which greatly enhances the expressive
> power of the linker API.
>
> So, we are in a place where some methods (e.g. factories which create
> some resource) takes an additional ResourceScope argument - and some
> other methods (e.g. methods that need to allocate native segments)
> which take an additional NativeAllocator argument. Now, it would be
> inconvenient for the user to have to create both, at least in simple
> use cases - but, since these are interfaces, nothing prevents us from
> creating a new abstraction which implements _both_ ResourceScope _and_
> NativeAllocator - in fact this is exactly what the role of the already
> existing NativeScope is!
>
> ```
> interface NativeScope extends NativeAllocator, ResourceScope { ... }
> ```
>
> In other words, we have retconned the existing NativeScope
> abstraction, by explaining its behavior in terms of more primitive
> abstractions (scopes and allocators). This means that clients can, for
> the most part, just create a NativeScope and then pass it whenever a
> ResourceScope or a NativeAllocator is required (which is what is
> already happening in all of our jextract examples).
>
> There are some additional bonus points of this approach.
>
> First, ResourceScope features some locking capabilities - e.g. you can
> do things like:
>
> ```
> try (ResourceScope.Lock lock = segment.scope().lock()) {
> <critical operation on segment>
> }
> ```
>
> Which allows clients to perform segment critical operations w/o
> worrying that a segment memory will be reclaimed while in the middle
> of the operation. This solves the problem with async operation on byte
> buffers derived from shared segments (see [3]).
>
> Another bonus point is that the ResourceScope interface is completely
> segment-agnostic - in fact, we have now a way to describe APIs which
> return resources which must be cleaned by the user (or, implicitly, by
> the GC). For instance, it would be entirely reasonable to imagine, one
> day, the ByteBuffer API to provide an additional factory - e.g.
> allocateDirect(int size, ResourceScope scope) - which gives you a
> direct buffer attached to a given (closeable) scope. The same trick
> can probably be used in other APIs as well where implicit cleanup has
> been preferred for performance and/or safety reasons.
>
> tl;dr;
>
> This restacking described in this email enhances the Foreign Memory
> Access API in many different ways, and allows clients to approach the
> API in increasing degrees of complexity (depending on needs):
>
> * for smoother transition, coming from the ByteBuffer API, users can
> only have swap ByteBuffer::allocateDirect with
> MemorySegment::allocateNative - not much else changes, no need to
> think about lifecycles (and ResourceScope); GC is still in charge of
> deallocation
> * users that want tighter control over resources, can dive deeper and
> learn how segments (and other resources) are attached to a resource
> scope (which can be closed safely, if needed)
> * for the native interop case, the NativeScope abstraction is
> retconned to be both a ResourceScope *and* a NativeAllocator - so it
> can be used whenever an API needs to know how to _allocate_ or which
> _lifecycle_ should be used for a newly created resource
> * scopes can be locked, which allows clients to write critical
> sections in which a segment has to be operated upon w/o fear of it
> being closed
> * the idiom described here can be used to e.g. enhance the ByteBuffer
> API and to add close capabilities there
>
> All the above require very little changes to the clients of the memory
> access API. The biggest change is that a MemorySegment no longer
> supports the AutoCloseable interface, which is instead moved to
> ResourceScope. While this can get a little more verbose in case you
> need a single segment, the code scales _a lot_ better in case you need
> multiple segments/resources. Existing clients using jextract-generated
> APIs, on the other hand, are not affected much, since they are mostly
> dependent on the NativeScope API, which this proposal does not alter
> (although the role of a NativeScope is now retconned to be allocator +
> scope).
>
> You can find a branch which implements some of the changes described
> above (except the changes to the foreign linker API) here:
>
> https://github.com/mcimadamore/panama-foreign/tree/resourceScope
>
> While an initial javadoc of the API described in this email can be
> found here:
>
> http://cr.openjdk.java.net/~mcimadamore/panama/resourceScope-javadoc_v2/javadoc/jdk/incubator/foreign/package-summary.html
>
>
>
> Cheers
> Maurizio
>
> [1] -
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011700.html
> [2] - https://datasketches.apache.org/docs/Memory/MemoryPackage.html
> [3] -
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011810.html
>
>
More information about the panama-dev
mailing list