[foreign-memaccess] RFC: to scope or not to scope?

Sat Jun 1 15:50:11 UTC 2019

Sounds like a good proposal.

One other advantage that this has is that it allows the size of the 
allocations to be controlled by the user, while MemoryScopeImpl just 
allocates a slab right away. I think that for a final version this is a 
very important feature.

Looking at the code, one other question that comes up is; how to handle 
alignment checking? Allowing unaligned access was something that was 
controlled by a scope characteristic. How will we support unaligned 
access with the new model? I think this characteristic could be pushed 
down into the VarHandle impls, either as a field, or by stamping out 
UnAlignedVarhandleMemoryAddressAsXXXs from a template (though the latter 
seems a bit overkill maybe).

About derived segments; One other use case that comes to mind is some 
allocator giving out segments to clients and then wanting to be notified 
when the client calls close() on their segment so that the segment can 
be added to a free list (or a similar scheme with reference counting). I 
think if we have an API that's open to extension, this and other cases 
like the one you describe could be solvable by users themselves. e.g. 
the user would provide their own implementation of MemorySegment, and 
decide whether a derived segment should close the root segment or not, 
or notify some other mechanism.

I agree that Scope is a higher-level allocator abstraction, which makes 
some strategy decisions that end up being costly in some use cases. 
Better to start with a simpler API, and make sure that it is extensible 
(either by us or by users) for all different use-cases.

---

And on that note; I think the same string can be pulled on the Layout + 
LayoutPath API. While this is useful for creating deref VarHandles, it's 
not exactly a minimal API. I think that the mismatch between what's 
possible to describe with the Layout API, and what's possible to 
actually dereference is a symptom of this. i.e. the Layout API is 
(still) too high-level. The way I imagine a minimal API would look is 
with some MemoryAddressVarHandleBuilder(Class<?> carrier) that has 
fixedOffset(long), variableOffset() and build() methods (and maybe 
allowUnAligned() ?), where offset is always given in bytes. We can later 
build the Layout + LayoutPath API on top, but at the same time allow 
users to roll their own by having a low-level API. Or rather, allow 
users to just replace the guts of their current offheap memory 
solutions.

---

About the current prototype, there seems to be a bug; we can not just 
duplicate native segments, because we end up with 2 liveliness flags for 
the same underlying memory resource. So, I don't think we can have the 
current asXXX() implementations (or probably anything that uses dup). I 
think for asReadOnly and asPinned we could create a 'view' segment, 
similar to what is done for 'resize' (which btw I think should be 
renamed to something that sounds less mutable, e.g. subSegment).

For asConfined() creating a view segment would not work, since the root 
segment, which holds the liveliness flag, would not be confined to a 
single thread. For asConfined to work as expected I believe we would 
have to kill the root segment, effectively transferring the underlying 
resource to the newly created confined segment. This also brings up an 
idea I had been thinking about; make thread confined-ness a more local, 
temporary state. e.g. you call asConfined which creates a segment that 
is confined to the current thread, but then add a release() method that 
invalidates the confined copy and makes the root segment valid again. 
This could be used around loops, e.g.:

     public void doWork(MemorySegment global) {
         ConfinedMemorySegment local = global.aquireConfined(); // kill 
'global'
         for (MyTask task : tasks) {
             task.doWork(local);
         }
         local.release(); // kill 'local', revive 'global'
     }

Where the 'local' object might be scalarized (escape analyzed away). The 
main purpose for having release() is that this 'local aquire' can be 
done multiple times without permanently killing the root segment.

The other option is to discard asConfined() and use a specialized 
factory method for confined segments, but that would create some 
asymmetry in the API if we keep asReadOnly/asPinned.

Cheers,
Jorn

Maurizio Cimadamore schreef op 2019-05-31 22:44:
> Hi,
> lately I've been thinking hard about the relationship between scopes
> and memory segments in the foreign-memaccess API. I think some of the
> decisions we made lately - e.g. make scopes either global (pinned) or
> closeable (and confined) is a good one. But now I wonder, why do we
> need scopes in this API?
> 
> A scope is useful to manage lifecycle of memory resources; you create
> a scope, do some allocation within it, close the scope and the
> associated resources are gone. This is a good programming model for
> high level layers, but is it still a good model for low-level layers?
> Over the last few weeks I noticed few things:
> 
> * creating a scope is quite expensive, as there are several data
> structures associated to it
> 
> * the ownership/parent mechanism is completely useless in this API,
> given that the scope parent-of relationship is really only useful when
> storing pointers from scope A into scope B, an operation that is not
> available in this API (since we have no pointers here)
> 
> * the API forces you through quite a bit of hops to get to what you
> want; code like this is pretty idiomatic:
> 
> try (MemoryScope scope = MemoryScope.globalScope().fork()) {
>     MemorySegment segment = scope.allocate(32);
>     MemoryAddress address = segment.baseAddress();
>     ...
> }
> 
> * Several of the new MemorySegment sources just use the 'pinned'
> UNCHECKED scope - de-facto turning scope checks off
> 
> 
> So, we have quite a complex API, which is used (in part) only when
> dealing with native memory; in the remaining case it's just noise,
> mostly.
> 
> This set me off thinking... what if we brought some of the aspect of
> memory scope into the memory segment itself (and then get rid of
> memory scopes) ? That is, let's see if we can add the following things
> to a segment:
> 
> * make it AutoCloseable - this way you can use a segment into a
> try-with-resources, as you would do with scopes
> 
> * add a 'isAlive' state; a segment starts off in the 'alive' state and
> then, when it's closed it goes in the 'closed' state - meaning that
> all addresses originating from it (as well as all the segments) will
> become invalid
> 
> * add confinement, so that only the owner thread is allowed to call
> methods in the segment (e.g. to resize or close it)
> 
> * add some methods to obtain a read-only segment, a confined one (one
> which only allows read/writes from owning thread), or a pinned one
> (one that cannot be closed)
> 
> In this new world, the above code can be rewritten as follows:
> 
> try (MemorySegment segment = MemorySegment.ofNative(32)) {
>     MemoryAddress address = segment.baseAddress();
>     ...
> }
> 
> This is more compact and goes straight to the point. I like how
> compact the API is - and I also like that now the API is very
> symmetric with the heap cases as well - for instance:
> 
> try (MemorySegment segment = MemorySegment.ofArray(new byte[32])) {
>     MemoryAddress address = segment.baseAddress();
>     ...
> }
> 
> The only thing that changes is the resource declaration in the try
> with resources!
> 
> 
> It's not all rosy of course; this choice has some consequences:
> 
> * memory segments are no longer immutable; that was possible before,
> since the mutable state was confined into the scope which was then
> attached to the segment. Now it's the segment itself that is mutable
> (in the liveness bit). While this could make transition to value types
> harder, I don't think it's really a blocker - in reality we could
> implement a very similar trick where we push the mutable state
> somewhere else, and then the segment becomes immutable again. Also, in
> real world cases I expect clients will do some kind of pooling,
> allocating big segments and then returning small pinned sub-regions to
> clients (thus avoiding one system call per allocation). In such cases,
> there is only one master mutable segment - and a lot of small
> immutable ones. Which is an happy case.
> 
> * Thinking about what happens when you e.g. resize a region is a bit
> harder if you can close a region. Should closing a sub-region also
> close the one it comes from? Or should we throw an exception? Or
> should we do nothing? or reference counting? After reading the very
> good docs on Netty [1], I came to the conclusions that closing a
> derived segment should also result in the closure of the root one.
> This choice would of course not be a very good one for pooled
> sub-regions, but for this we can always use the pinning operation -
> that is: create a resized sub-region, pin it, and return it to the
> client. I think the combination of resizing + pinning gives quite a
> bit of power and I don't see any immediate need for doing something
> with reference counting.
> 
> 
> I then realized that the pointer scopes we have in foreign right now
> can in fact be implemented *cleanly* on top of this lower level memory
> segment mechanism. Here's a snippet of code which demonstrates how one
> would go about writing a PointerScope which allocates a slab of memory
> and returns portions of it to the clients:
> 
> 
> class PointerScopeImpl implements PointerScope {
>     long SEGMENT_SIZE = 64 * 1024;
> 
>     List<MemorySegment> usedSegments;
>     MemorySegment currentSegment;
>     long offsetInSegment;
> 
>     <X> Pointer<X> allocate(LayoutType<X> type) {
>        MemorySegment segment =
> allocateInternal(type.layout().bytesSize(),
> type.layout().alignmentBytes());
>        return new BoundedPointer<X>(type, segment.baseAddress());
>     }
> 
>     ...
> 
>     MemorySegment allocateInternal(long bytes, long align) {
>          long size = type.bytesSize();
>          if (offsetInSegment + size > SEGMENT_SIZE) {
>              usedSegments.add(currentSegment);
>              currentSegment =
> MemorySegment.ofNative(Math.min(SEGMENT_SIZE, size), align);
>              offsetInSegment = 0;
>          }
>          MemorySegment segment = currentSegment.resize(offsetInSegment, 
> size);
>          offsetInSegment += size;
>          return segment;
>     }
> 
>     void close() {
>         currentSegment.close();
>         usedSegments.forEach().close();
>     }
> }
> 
> That is, we can get the same functionality we have in Panama,
> essentially using segments as a way to get at the Unsafe allocation
> facilities. This seems pretty cool!
> 
> I've put together a prototype of this approach (should apply cleanly
> on top of foreign-memaccess):
> 
> http://cr.openjdk.java.net/~mcimadamore/panama/scope-removal/
> 
> I was pleased at how the tests could be simplified with this approach.
> I was also please to see that the performance numbers took a
> significant jump forward, essentially bringing this within reach of
> raw Unsafe usage (but with the extra safety sprinkled on top).
> 
> Concluding, this seems yet another of those cases where we were trying
> to conflate high-level concerns with lower-level concerns, and once we
> push everything in the right place of the stack, things seems to slot
> into a lower energy state. What do you think?
> 
> Cheers
> Maurizio
> 
> [1] - https://netty.io/wiki/reference-counted-objects.html