[foreign-memaccess] on confinement

Wed Jun 5 20:30:21 UTC 2019

So... it seems MemorySegment is going to acquire the following 
functionalities:

* isShared() - predicate which tells if the segment can be shared across 
multiple owners

* handoff(Thread) - kills the segment and clones it into a new one with 
different owner, or throws IAE (if isShared() is false)

* isSubsegment() - predicate which tells as to whether the segment is a 
subsegment

* parent() - returns the subsegment parent, or throw IAE (if 
isSubsegment is false)

* merge() - kills a subsegment (and, if we're in a shared context, 
decrements the parent subsegment counter), or throw IAE (if isSubsegment 
is false)

* I would still retain my 'split' method - but we can write it on top of 
resize now. he only thing 'split' adds is user convenience that all 
subsegments are disjoints by construction

While we could use different public types for these, I'm a bit skeptical 
on the impact on the API - for instance, we could have a type for 
subsegments - but then we'd really need two - one for shared subsegments 
and one for unshared subsegments. We could factor some of the 
differences with some use of generic types, but I think it would lead to 
a very unfriendly API.

So, I think it's probably better to lump all the concepts together.

At the same time, I also see that this will add quite few methods on 
MemorySegment that have solely to do with life-cycle management. So the 
case for some kind of 'scope' is, I believe, appearing again.

I wonder if maybe the fact that our old scope was doing too much was 
also part of the issue - as it was both a lifecycle manager AND an 
allocator at the same time, and obviously the allocator capabilities 
only really made sense in the case of native memory which then led to an 
asymmetric API.

So... what if:

* you always needed a scope to create a segment, regardless of the kind 
(e.g. the segment factories will get a scope parameter)

* we have two scopes: shared and unshared - shared scope supports 
handoff, the unshared doesn't (and, if we have public types we can make 
that explicit)

* handoff/close operations goes in the scope

* we have a MemorySegment interface, but also a MemorySubSegment 
interface which supports an extra 'merge' operation

I think this could work - and seems more orthogonal than what we were 
trying to do before?

Maurizio

On 05/06/2019 18:34, Jorn Vernee wrote:
> Comments inline...
>
> Maurizio Cimadamore schreef op 2019-06-05 19:16:
>> On 05/06/2019 17:48, Jorn Vernee wrote:
>>>> Thoughts?
>>>
>>> I don't think we can safely do our confinement check `scope.owner != 
>>> Thread.currentThread()` if scope.owner is mutable, without some form 
>>> of synchronization. I really think the confinement thread should be 
>>> determined at (sub)segment creation time, and then be immutable 
>>> afterwards.
>>
>> Yes, that's a piece that's missing.
>>
>> But it seems like we landed in pretty similar places indeed, down to
>> your proposed AtomicInteger (I just use an int in my implementation,
>> but that's the same check I believe to the one you were proposing).
>>
>> I like your take on overlapping regions - since they are all de facto
>> pinned in a shared context, there's no safety concern; it's up to the
>> user to 'make it right'.
>>
>> One thing that still stands though: we need some kind of cleaner to go
>> after GCed segments, to keep the reference count in check.
>>
>> This is actually a bigger topic: what to do with things that go out of
>> scope - even a simple segment going out of scope could mean a memory
>> leak (e.g. nobody calling Unsafe::freeMemory).
>>
>> Here we could:
>>
>> 1) do nothing - just let things leak - it's user responsibility to
>> clean things up
>>
>> 2) detect when things are GCed and call 'close' forcibly
>>
>> This is an hard choice; for root segments we'd probably like (1),
>> whereas for subsegments if you do (1) you end up closing memory for
>> other related segments too (which might still be alive!!) - at least
>> in the case where the segment is not shared. At the same time, since a
>> subsegment can keep hold of the root segment, the root segment can
>> never get GCed if at least one of its subsegment is reachable.
>>
>> At the moment I'm more for (1), since otherwise it would be pretty
>> hard for the user to understand what's going on.
>
> I think (1) is much better because:
>
>  * We want to have prompt cleanup of memory.
>  * We want to share offheap memory with native code later on, for 
> which the GC can't see the references to the resource.
>
> Though, using GC + Cleaner as a fallback for view segments to keep the 
> reference count in check seems fine as well, since that should never 
> clean up the memory resource.
>
>> (again, this problem is made more acute by segments, but it's there
>> with scopes too)
>>
>> Maurizio
>
> I fiddled a little with your example as well to make the owner thread 
> field immutable: http://cr.openjdk.java.net/~jvernee/panama/Test.java
>
> Jorn
>
>>
>>
>>
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-06-05 17:40:
>>>> Thanks Jorn,
>>>> I went pretty much through the same reasoning and realized that:
>>>>
>>>> a) confinement must be the default
>>>>
>>>> b) handing off ownership must be an opt-in
>>>>
>>>> b2) similarly, racy shared segments which synchronize on the liveness
>>>> bit can be an equally appealing opt-in
>>>>
>>>> c) tracking region overlapping is uber-expensive; it's much better to
>>>> define primitives which allow 'splitting a region' in non overlapping
>>>> segments by construction (e.g. resize is not the way to get what we
>>>> want here); let's call this 'split'
>>>>
>>>> c2) the bits returned by 'split' are _pinned_
>>>>
>>>> d) we need a way to 'merge' the bits back into the parent.
>>>>
>>>> What I came up with this this [1], which I think kind of implements
>>>> your principle for ShareableSegments (note this is an example, not a
>>>> full blown Panama patch).
>>>>
>>>> Thoughts?
>>>>
>>>> Maurizio
>>>>
>>>> [1] - 
>>>> http://cr.openjdk.java.net/~mcimadamore/panama/TestScopedSegmentMerge.java
>>>>
>>>> On 05/06/2019 14:59, Jorn Vernee wrote:
>>>>> One other thing I realized; closing the root segment through a 
>>>>> view segment (like proposed before) is only possible when the root 
>>>>> segment and _all_ view segments are confined to the same thread. 
>>>>> At least if we want to avoid synchronization on access when 
>>>>> checking liveliness. I think this gets us the following set of 
>>>>> rules for non-shared segments:
>>>>>
>>>>> 1. Terminal operations are always thread confined (safety feature 
>>>>> to prevent VM crashes when resource is freed by another thread).
>>>>> 2. Always confined to the same thread (avoid mutable fields, 
>>>>> complexity in implementation).
>>>>> 3. We can close the root segment through a view segment.
>>>>> 4. We can not share a view segment with a different thread (would 
>>>>> break rule 1. when combined with 3.).
>>>>> 5. No need for the user to keep a reference to the root segment, 
>>>>> since we can close it through a view segment.
>>>>> 6. No need for subsegment tracking.
>>>>>
>>>>> Also, shareability should be an opt-in, but it seems that 
>>>>> supporting lazy transition into a shared state (with asConfined()) 
>>>>> creates too much complexity for the simple single-threaded case, 
>>>>> so I think it should be an opt-in at segment creation time. That 
>>>>> way we can keep the 'default' single threaded implementation fast 
>>>>> and simple.
>>>>>
>>>>> ---
>>>>>
>>>>> We could still go with a separate ShareableSegment type, which 
>>>>> does allow sharing of view segments with other threads, but does 
>>>>> not allow closing the root segment through a view segment. To 
>>>>> avoid mutable confinement thread fields we can require the 
>>>>> confinement thread to be specified when creating the view segment. 
>>>>> A strawman:
>>>>>
>>>>>     interface ShareableSegment extends MemorySegment {
>>>>>         MemorySegment resize(Thread confinementThread, long 
>>>>> offset, long length); // support 'divide et impera'.
>>>>>         default MemorySegment resize(long offset, long length) {
>>>>>             return resize(Thread.currentThread(), offset, length);
>>>>>         }
>>>>>
>>>>>         void merge(MemorySegment subsegment); // could do 
>>>>> automatically with GC + Cleaner as well
>>>>>         // need some synchronization if resize and merge can be 
>>>>> called by other threads then the root's confinement thread
>>>>>
>>>>>         // ... factory methods
>>>>>     }
>>>>>
>>>>> Which gets us the following rules for shareable segments:
>>>>>
>>>>> 1. Terminal operations are always thread confined (safety feature 
>>>>> to prevent VM crashes when resource is freed by another thread).
>>>>> 2. Always confined to the same thread (avoid mutable fields, 
>>>>> complexity in implementation).
>>>>> 3. View segments can be confined to different threads than the 
>>>>> root segment.
>>>>> 4. We can not close the root segment through a view segment (would 
>>>>> break rule 1 when combined with 3).
>>>>> 5. The user must keep a reference to the root segment at all times 
>>>>> to be able to close it and avoid resource leaks.
>>>>> 6. Need to track subsegments in order to know whether the root 
>>>>> segment can be closed safely.
>>>>>
>>>>> ---
>>>>>
>>>>> Also, overlap of subsegments will break confinement in the sense 
>>>>> that multiple threads can write/read to/from the same region, but 
>>>>> since subsegments owned by multiple threads can not free/release 
>>>>> the underlying resource, I don't think overlapping subsegments 
>>>>> could crash the VM. So, maybe it's good enough to tell the user to 
>>>>> make sure that subsegments owned by different thread's don't 
>>>>> interfere which each other, but we don't enforce that in the 
>>>>> implementation?
>>>>>
>>>>> If we go that route I believe we can make the subsegment tracking 
>>>>> for ShareableSegment a simple AtomicLong reference count. Where 
>>>>> the liveliness flag in a subsegment is a reference to the root 
>>>>> segment, that is nulled out when merging, and also used to make 
>>>>> sure that merge is called with an actual subsegment.
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-06-05 02:16:
>>>>>> On 04/06/2019 17:03, Maurizio Cimadamore wrote:
>>>>>>> Note: I'm not saying this will be trivial to implement correctly 
>>>>>>> - but what I like about this is that the programming model will 
>>>>>>> look relatively clean in comparison to something like (1). 
>>>>>>> Essentially you can slice and dice all you want, and, as long as 
>>>>>>> you are asking reasonable questions, things will work with 
>>>>>>> decent performances.
>>>>>>
>>>>>> Quick update; I've been doing some experiment on this - it doesn't
>>>>>> look pretty for now.
>>>>>>
>>>>>> Some of the issues we have to take into account:
>>>>>>
>>>>>> * as discussed, we want the master region to somehow keep track (via
>>>>>> its mutable 'scope-like' object) of the sub-regions
>>>>>>
>>>>>> * if we share the same scope for all subregions (which we probably
>>>>>> want to avoid too much allocation on resize) then we need to have a
>>>>>> way for the sub-region to perform an efficient confinement check 
>>>>>> - one
>>>>>> trick I used was to give each sub region an unique index, and 
>>>>>> then use
>>>>>> the index to access a subregion 'ownership' array
>>>>>>
>>>>>> * we need to take into account regions being GCed - otherwise the
>>>>>> lists kept into the master region will (potentially) grow w/o bounds
>>>>>>
>>>>>> * we need to take into account synchronization when adding/removing
>>>>>> sub-regions - this is probably not a big concern given that these
>>>>>> operations occur during a 'resize' or when a region is being GC, so
>>>>>> the memory access itself can still be fast
>>>>>>
>>>>>> * since we can transfer ownership, the owner thread is not a final
>>>>>> constant anymore... this will probably affect performances
>>>>>> considerably
>>>>>>
>>>>>> * I haven't even started to look at rejecting overlapping sub 
>>>>>> regions
>>>>>> with different owners...
>>>>>>
>>>>>> Needless to say, the resulting implementation is very finicky, 
>>>>>> and I'm
>>>>>> worried about the overall performance model of this approach.
>>>>>>
>>>>>> Also, I don't think that what I'm seeing is an artifact of lumping
>>>>>> MemoryScope and MemorySegment together - yes, in principle having a
>>>>>> separate scope (with a notion of confinement in it) helps in the 
>>>>>> sense
>>>>>> that resizing a segment becomes an orthogonal concern. But then you
>>>>>> are back in a world where you can't give a different thread owner to
>>>>>> different sub-region, and the only way around that restriction is to
>>>>>> use memory copy (e.g. create a new segment and copy contents of the
>>>>>> old one to the new).
>>>>>>
>>>>>> If that cross-subregion policy is what we realistically want to
>>>>>> enforce, then I don't think it's worth doing a lot of heroics here -
>>>>>> we can simply say that a segment is confined to a thread, there's no
>>>>>> ownership transfer operation, but the same effects can be achieved
>>>>>> through memory copy. This doesn't seem quite a rich a story as 
>>>>>> the one
>>>>>> we were looking at - but if we were ok with Scope being in charge of
>>>>>> thread confinement, this would have been the only story possible.
>>>>>>
>>>>>> So, the question becomes: do we really need a way to transfer
>>>>>> ownership of a segment from thread A to thread B ? And if so, what
>>>>>> granularity should be used? I think these are the possible answers:
>>>>>>
>>>>>> a) ownership transfer not supported - region copy should be used 
>>>>>> as a workaround
>>>>>> b) ownership transfer supported; all subregion are constrained to 
>>>>>> have
>>>>>> same owner as the root; when ownership changes, all subregions 
>>>>>> change
>>>>>> ownership too
>>>>>> c) ownership transfer supported; subregion ownership can set
>>>>>> independently of the root
>>>>>>
>>>>>> I realized that, in the email I've sent this morning I picked the 
>>>>>> most
>>>>>> difficult point in the design space (c) - that is, support ownership
>>>>>> transfers at the subregion granularity. This seems useful to 
>>>>>> implement
>>>>>> divide and conquer algorithms, but at the same time, I realized, 
>>>>>> this
>>>>>> was simply not possible with the scope-based solution we had before
>>>>>> (since all subregions had same scope there - hence same 
>>>>>> confinement).
>>>>>>
>>>>>> In other words, all the implementation strategies we've seen so far
>>>>>> are capable of handling either (a) or (b) [as for (b) I'm not sure
>>>>>> about the potential JIT cost in making thread owner non-final]. The
>>>>>> implementation story for (c) is far more convoluted (**), and I'm 
>>>>>> very
>>>>>> skeptical that, even if we can pull that off, it will perform in 
>>>>>> a way
>>>>>> that will be deemed acceptable.
>>>>>>
>>>>>> Is (c) simply asking for too much? And, if so, is (b) something that
>>>>>> could be useful still?
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>> (**) Honestly, the overlapping region check seems the straw that
>>>>>> breaks the camel's back - to implement the check it's sadly
>>>>>> unavoidable to keep all subregions which share the same root in the
>>>>>> same place - which then poses aforementioned problems with 
>>>>>> respect to
>>>>>> such subregions being GCed, and need for synchronization when
>>>>>> maintaining all the ancillary lists. And, this overlapping region
>>>>>> check is needed in both the approached (1) and (2) that I have
>>>>>> outlined earlier in [1], I believe.
>>>>>>
>>>>>> [1] - 
>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-June/005674.html 
>>>>>>