memory access - pulling all the threads

Mon Jan 25 17:52:10 UTC 2021

Hi,
as you know, I've been looking at both internal and external feedback on 
usage of the memory access API, in an attempt to understand what the 
problem with the API are, and how to move forward. As discussed here 
[1], there are some things which work well, such as structured access, 
or recent addition to shared segment support (the latter seem to have 
enabled a wide variety of experiments which allowed us to gather more 
feedback - thanks!). But there are still some issues to be resolved - 
which could be summarized as "the MemorySegment abstraction is trying to 
do too many things at once" (again, please refer to [1] for a more 
detailed description of the problem involved).

In [1] I described a possible approach where every allocation method 
(MemorySegment::allocateNative and MemorySegment::mapFile) return a 
"allocation handle", not a segment directly. The handle is the closeable 
entity, while the segment is just a view. While this approach is 
workable (and something very similar has indeed been explored here [2]), 
after implementing some parts of it, I was left not satisfied with how 
this approach integrates with respect to the foreign linker support. For 
instance, defining the behavior of methods such as CLinker::toCString 
becomes quite convoluted: where does the allocation handle associated 
with the returned string comes from? If the segment has no pointer to 
the handle, how can the memory associated to the string be closed? What 
is the relationship between an allocation handle and a NativeScope? All 
these questions led me to conclude that the proposed approach was not 
enough, and that we needed to try harder.

The above approach does one thing right: it splits memory segments from 
the entity managing allocation/closure of memory resources, thus turning 
memory segments into dumb views. But it doesn't go far enough in this 
direction; as it turns out, what we really want here is a way to capture 
the concept of the lifecycle that is associated to one or more 
(logically related) resources - which, unsurprisingly, is part of what 
NativeScope does too. So, let's try to model this abstraction:

```
interface ResourceScope extends AutoCloseable {
    void addOnClose(Runnable) // adds a new cleanup action to this scope
    void close() // closes the scope

    static ResourceScope ofConfined() // creates a confined resource scope
    static ResourceScope ofShared() // creates a shared resource scope
    static ResourceScope ofConfined(Cleaner) // creates a confined 
resource scope - managed by cleaner
    static ResourceScope ofShared(Cleaner) // creates a shared resource 
scope - managed by cleaner
}
```

It's a very simple interface - you can basically add new cleanup actions 
to it, which will be called when the scope is closed; note that 
ResourceScope supports implicit close (via a Cleaner), or explicit close 
(via the close method) - it can even support both (not shown here).

Armed with this new abstraction, let's try to see if we can shine new 
light onto some of the existing API methods and abstractions.

Let's start with heap segments - these are allocated using one of the 
MemorySegment::ofArray() factories; one of the issues with heap segments 
is that it doesn't make much sense to close them. In the proposed 
approach, this can be handled nicely: heap segments are associated with 
a _global_ scope that cannot be closed - a scope that is _always alive_. 
This clarifies the role of heap segments (and also of buffer segments) 
nicely.

Let's proceed to MemorySegment::allocateNative/mapFile - what should 
these factories do? Under the new proposal, these method should accept a 
ResourceScope parameter, which defines the lifecycle to which the newly 
created segment should be attached to. If we want to still provide 
ResourceScope-less overloads (as the API does now) we can pick a useful 
default: a shared, non-closeable, cleaner-backed scope. This choice 
gives us essentially the same semantics as a byte buffer, so it would be 
an ideal starting point for developers coming from the ByteBuffer API 
trying to familiarize with the new memory access API. Note that, when 
using these more compact factories, scopes are almost entirely hidden 
from the client - so no extra complexity is added (compared e.g. to the 
ByteBuffer API).

As it turns out, ResourceScope is not only useful for segments, but it 
is also useful for a number of entities which need to be attached to 
some lifecycle, such as:

* upcall stubs
* va lists
* loaded libraries

The upcall stub case is particularly telling: in that case, we have 
decided to model an upcall stub as a MemorySegment not because it makes 
sense to dereference an upcall stub - but simply because we need to have 
a way to _release_ the upcall stub once we're done using it. Under the 
new proposal, we have a new, powerful option: the upcall stub API point 
can accept an user-provided ResourceScope which will be responsible for 
managing the lifecycle of the upcall stub entity. That is, we are now  
free to turn the result of a call to upcallStub to something other than 
a MemorySegment (e.g. a FunctionPointer?) w/o loss of functionality.

Resource scopes are very useful to manage _group_ of resources - there 
are in fact cases where one or more segments share the same lifecycle - 
that is, they need to be all alive at the same time; to handle some of 
these use cases, the status quo adds the NativeScope abstraction, which 
can accept registration of external memory segment (via the 
MemorySegment::handoff) API. This use case is naturally handled by the 
ResourceScope API:

```
try (ResourceScope scope : ResourceScope.ofConfined()) {
    MemorySegment.allocateNative(layout, scope):
    MemorySegment.mapFile(... , scope);
    CLinker.upcallStub(..., scope);
} // release all resources
```

Does this remove the need for NativeScope ? Not so fast: NativeScope is 
used to group logically related resources, yes, but is also used as a 
faster, arena-based allocator - which attempts to minimize the number of 
system calls (e.g. to malloc) by allocating bigger memory blocks and 
then handing over slices to clients. Let's try to model the 
allocation-nature of a NativeScope with a separate interface, as follows:

```
@FunctionalInterface
interface NativeAllocator {
    MemorySegment allocate(long size, long align);
    default allocateInt(MemoryLayout intLayout, int value) { ... }
    default allocateLong(MemoryLayout intLayout, long value) { ... }
    ... // all allocation helpers in NativeScope
}
```

At first, it seems this interface doesn't add much. But it is quite 
powerful - for instance, a client can create a simple, malloc-like 
allocator, as follows:

```
NativeAllocator malloc = (size, align) -> 
MemorySegment.allocateNative(size, align, ResourceScope.ofConfined());

```

This is an allocator which allocates a new region of memory on each 
allocation request, backed by a fresh confined scope (which can be 
closed independently). This idiom is in fact so common that the API 
allows client to create these allocators in a more compact fashion:

```
NativeAllocator confinedMalloc = 
NativeAllocator.ofMalloc(ResourceScope::ofConfined);
NativeAllocator sharedMalloc = 
NativeAllocator.ofMalloc(ResourceScope::ofConfined);
```

But other strategies are also possible:

* arena allocation (e.g. the allocation strategy currently used by 
NativeScope)
* recycling allocation (a single segment, with given layout, is 
allocated, and allocation requests are served by repeatedly slicing that 
very segment) - this is a critical optimization in e.g. loops
* interop with custom allocators

So, where would we accept a NativeAllocator in our API? Turns out that 
accepting an allocator is handy whenever an API point needs to allocate 
some native memory - so, instead of

```
MemorySegment toCString(String)
```

This is better:

```
MemorySegment toCString(String, NativeAllocator)
```

Of course, we need to tweak the foreign linker, so that in all foreign 
calls returning a struct by value (which require some allocation), a 
NativeAllocator prefix argument is added to the method handle, so that 
the user can specify which allocator should be used by the call; this is 
a straightforward change which greatly enhances the expressive power of 
the linker API.

So, we are in a place where some methods (e.g. factories which create 
some resource) takes an additional ResourceScope argument - and some 
other methods (e.g. methods that need to allocate native segments) which 
take an additional NativeAllocator argument. Now, it would be 
inconvenient for the user to have to create both, at least in simple use 
cases - but, since these are interfaces, nothing prevents us from 
creating a new abstraction which implements _both_ ResourceScope _and_ 
NativeAllocator - in fact this is exactly what the role of the already 
existing NativeScope is!

```
interface NativeScope extends NativeAllocator, ResourceScope { ... }
```

In other words, we have retconned the existing NativeScope abstraction, 
by explaining its behavior in terms of more primitive abstractions 
(scopes and allocators). This means that clients can, for the most part, 
just create a NativeScope and then pass it whenever a ResourceScope or a 
NativeAllocator is required (which is what is already happening in all 
of our jextract examples).

There are some additional bonus points of this approach.

First, ResourceScope features some locking capabilities - e.g. you can 
do things like:

```
try (ResourceScope.Lock lock = segment.scope().lock()) {
    <critical operation on segment>
}
```

Which allows clients to perform segment critical operations w/o worrying 
that a segment memory will be reclaimed while in the middle of the 
operation. This solves the problem with async operation on byte buffers 
derived from shared segments (see [3]).

Another bonus point is that the ResourceScope interface is completely 
segment-agnostic - in fact, we have now a way to describe APIs which 
return resources which must be cleaned by the user (or, implicitly, by 
the GC). For instance, it would be entirely reasonable to imagine, one 
day, the ByteBuffer API to provide an additional factory - e.g. 
allocateDirect(int size, ResourceScope scope) - which gives you a direct 
buffer attached to a given (closeable) scope. The same trick can 
probably be used in other APIs as well where implicit cleanup has been 
preferred for performance and/or safety reasons.

tl;dr;

This restacking described in this email enhances the Foreign Memory 
Access API in many different ways, and allows clients to approach the 
API in increasing degrees of complexity (depending on needs):

* for smoother transition, coming from the ByteBuffer API, users can 
only have swap ByteBuffer::allocateDirect with 
MemorySegment::allocateNative - not much else changes, no need to think 
about lifecycles (and ResourceScope); GC is still in charge of deallocation
* users that want tighter control over resources, can dive deeper and 
learn how segments (and other resources) are attached to a resource 
scope (which can be closed safely, if needed)
* for the native interop case, the NativeScope abstraction is retconned 
to be both a ResourceScope *and* a NativeAllocator - so it can be used 
whenever an API needs to know how to _allocate_ or which _lifecycle_ 
should be used for a newly created resource
* scopes can be locked, which allows clients to write critical sections 
in which a segment has to be operated upon w/o fear of it being closed
* the idiom described here can be used to e.g. enhance the ByteBuffer 
API and to add close capabilities there

All the above require very little changes to the clients of the memory 
access API. The biggest change is that a MemorySegment no longer 
supports the AutoCloseable interface, which is instead moved to 
ResourceScope. While this can get a little more verbose in case you need 
a single segment, the code scales _a lot_ better in case you need 
multiple segments/resources. Existing clients using jextract-generated 
APIs, on the other hand, are not affected much, since they are mostly 
dependent on the NativeScope API, which this proposal does not alter 
(although the role of a NativeScope is now retconned to be allocator + 
scope).

You can find a branch which implements some of the changes described 
above (except the changes to the foreign linker API) here:

https://github.com/mcimadamore/panama-foreign/tree/resourceScope

While an initial javadoc of the API described in this email can be found 
here:

http://cr.openjdk.java.net/~mcimadamore/panama/resourceScope-javadoc_v2/javadoc/jdk/incubator/foreign/package-summary.html

Cheers
Maurizio

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011700.html
[2] - https://datasketches.apache.org/docs/Memory/MemoryPackage.html
[3] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011810.html