[foreign-memaccess] on confinement

Wed Jun 5 17:34:30 UTC 2019

Comments inline...

Maurizio Cimadamore schreef op 2019-06-05 19:16:
> On 05/06/2019 17:48, Jorn Vernee wrote:
>>> Thoughts?
>> 
>> I don't think we can safely do our confinement check `scope.owner != 
>> Thread.currentThread()` if scope.owner is mutable, without some form 
>> of synchronization. I really think the confinement thread should be 
>> determined at (sub)segment creation time, and then be immutable 
>> afterwards.
> 
> Yes, that's a piece that's missing.
> 
> But it seems like we landed in pretty similar places indeed, down to
> your proposed AtomicInteger (I just use an int in my implementation,
> but that's the same check I believe to the one you were proposing).
> 
> I like your take on overlapping regions - since they are all de facto
> pinned in a shared context, there's no safety concern; it's up to the
> user to 'make it right'.
> 
> One thing that still stands though: we need some kind of cleaner to go
> after GCed segments, to keep the reference count in check.
> 
> This is actually a bigger topic: what to do with things that go out of
> scope - even a simple segment going out of scope could mean a memory
> leak (e.g. nobody calling Unsafe::freeMemory).
> 
> Here we could:
> 
> 1) do nothing - just let things leak - it's user responsibility to
> clean things up
> 
> 2) detect when things are GCed and call 'close' forcibly
> 
> This is an hard choice; for root segments we'd probably like (1),
> whereas for subsegments if you do (1) you end up closing memory for
> other related segments too (which might still be alive!!) - at least
> in the case where the segment is not shared. At the same time, since a
> subsegment can keep hold of the root segment, the root segment can
> never get GCed if at least one of its subsegment is reachable.
> 
> At the moment I'm more for (1), since otherwise it would be pretty
> hard for the user to understand what's going on.

I think (1) is much better because:

  * We want to have prompt cleanup of memory.
  * We want to share offheap memory with native code later on, for which 
the GC can't see the references to the resource.

Though, using GC + Cleaner as a fallback for view segments to keep the 
reference count in check seems fine as well, since that should never 
clean up the memory resource.

> (again, this problem is made more acute by segments, but it's there
> with scopes too)
> 
> Maurizio

I fiddled a little with your example as well to make the owner thread 
field immutable: http://cr.openjdk.java.net/~jvernee/panama/Test.java

Jorn

> 
> 
> 
>> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-06-05 17:40:
>>> Thanks Jorn,
>>> I went pretty much through the same reasoning and realized that:
>>> 
>>> a) confinement must be the default
>>> 
>>> b) handing off ownership must be an opt-in
>>> 
>>> b2) similarly, racy shared segments which synchronize on the liveness
>>> bit can be an equally appealing opt-in
>>> 
>>> c) tracking region overlapping is uber-expensive; it's much better to
>>> define primitives which allow 'splitting a region' in non overlapping
>>> segments by construction (e.g. resize is not the way to get what we
>>> want here); let's call this 'split'
>>> 
>>> c2) the bits returned by 'split' are _pinned_
>>> 
>>> d) we need a way to 'merge' the bits back into the parent.
>>> 
>>> What I came up with this this [1], which I think kind of implements
>>> your principle for ShareableSegments (note this is an example, not a
>>> full blown Panama patch).
>>> 
>>> Thoughts?
>>> 
>>> Maurizio
>>> 
>>> [1] - 
>>> http://cr.openjdk.java.net/~mcimadamore/panama/TestScopedSegmentMerge.java
>>> 
>>> On 05/06/2019 14:59, Jorn Vernee wrote:
>>>> One other thing I realized; closing the root segment through a view 
>>>> segment (like proposed before) is only possible when the root 
>>>> segment and _all_ view segments are confined to the same thread. At 
>>>> least if we want to avoid synchronization on access when checking 
>>>> liveliness. I think this gets us the following set of rules for 
>>>> non-shared segments:
>>>> 
>>>> 1. Terminal operations are always thread confined (safety feature to 
>>>> prevent VM crashes when resource is freed by another thread).
>>>> 2. Always confined to the same thread (avoid mutable fields, 
>>>> complexity in implementation).
>>>> 3. We can close the root segment through a view segment.
>>>> 4. We can not share a view segment with a different thread (would 
>>>> break rule 1. when combined with 3.).
>>>> 5. No need for the user to keep a reference to the root segment, 
>>>> since we can close it through a view segment.
>>>> 6. No need for subsegment tracking.
>>>> 
>>>> Also, shareability should be an opt-in, but it seems that supporting 
>>>> lazy transition into a shared state (with asConfined()) creates too 
>>>> much complexity for the simple single-threaded case, so I think it 
>>>> should be an opt-in at segment creation time. That way we can keep 
>>>> the 'default' single threaded implementation fast and simple.
>>>> 
>>>> ---
>>>> 
>>>> We could still go with a separate ShareableSegment type, which does 
>>>> allow sharing of view segments with other threads, but does not 
>>>> allow closing the root segment through a view segment. To avoid 
>>>> mutable confinement thread fields we can require the confinement 
>>>> thread to be specified when creating the view segment. A strawman:
>>>> 
>>>>     interface ShareableSegment extends MemorySegment {
>>>>         MemorySegment resize(Thread confinementThread, long offset, 
>>>> long length); // support 'divide et impera'.
>>>>         default MemorySegment resize(long offset, long length) {
>>>>             return resize(Thread.currentThread(), offset, length);
>>>>         }
>>>> 
>>>>         void merge(MemorySegment subsegment); // could do 
>>>> automatically with GC + Cleaner as well
>>>>         // need some synchronization if resize and merge can be 
>>>> called by other threads then the root's confinement thread
>>>> 
>>>>         // ... factory methods
>>>>     }
>>>> 
>>>> Which gets us the following rules for shareable segments:
>>>> 
>>>> 1. Terminal operations are always thread confined (safety feature to 
>>>> prevent VM crashes when resource is freed by another thread).
>>>> 2. Always confined to the same thread (avoid mutable fields, 
>>>> complexity in implementation).
>>>> 3. View segments can be confined to different threads than the root 
>>>> segment.
>>>> 4. We can not close the root segment through a view segment (would 
>>>> break rule 1 when combined with 3).
>>>> 5. The user must keep a reference to the root segment at all times 
>>>> to be able to close it and avoid resource leaks.
>>>> 6. Need to track subsegments in order to know whether the root 
>>>> segment can be closed safely.
>>>> 
>>>> ---
>>>> 
>>>> Also, overlap of subsegments will break confinement in the sense 
>>>> that multiple threads can write/read to/from the same region, but 
>>>> since subsegments owned by multiple threads can not free/release the 
>>>> underlying resource, I don't think overlapping subsegments could 
>>>> crash the VM. So, maybe it's good enough to tell the user to make 
>>>> sure that subsegments owned by different thread's don't interfere 
>>>> which each other, but we don't enforce that in the implementation?
>>>> 
>>>> If we go that route I believe we can make the subsegment tracking 
>>>> for ShareableSegment a simple AtomicLong reference count. Where the 
>>>> liveliness flag in a subsegment is a reference to the root segment, 
>>>> that is nulled out when merging, and also used to make sure that 
>>>> merge is called with an actual subsegment.
>>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-06-05 02:16:
>>>>> On 04/06/2019 17:03, Maurizio Cimadamore wrote:
>>>>>> Note: I'm not saying this will be trivial to implement correctly - 
>>>>>> but what I like about this is that the programming model will look 
>>>>>> relatively clean in comparison to something like (1). Essentially 
>>>>>> you can slice and dice all you want, and, as long as you are 
>>>>>> asking reasonable questions, things will work with decent 
>>>>>> performances.
>>>>> 
>>>>> Quick update; I've been doing some experiment on this - it doesn't
>>>>> look pretty for now.
>>>>> 
>>>>> Some of the issues we have to take into account:
>>>>> 
>>>>> * as discussed, we want the master region to somehow keep track 
>>>>> (via
>>>>> its mutable 'scope-like' object) of the sub-regions
>>>>> 
>>>>> * if we share the same scope for all subregions (which we probably
>>>>> want to avoid too much allocation on resize) then we need to have a
>>>>> way for the sub-region to perform an efficient confinement check - 
>>>>> one
>>>>> trick I used was to give each sub region an unique index, and then 
>>>>> use
>>>>> the index to access a subregion 'ownership' array
>>>>> 
>>>>> * we need to take into account regions being GCed - otherwise the
>>>>> lists kept into the master region will (potentially) grow w/o 
>>>>> bounds
>>>>> 
>>>>> * we need to take into account synchronization when adding/removing
>>>>> sub-regions - this is probably not a big concern given that these
>>>>> operations occur during a 'resize' or when a region is being GC, so
>>>>> the memory access itself can still be fast
>>>>> 
>>>>> * since we can transfer ownership, the owner thread is not a final
>>>>> constant anymore... this will probably affect performances
>>>>> considerably
>>>>> 
>>>>> * I haven't even started to look at rejecting overlapping sub 
>>>>> regions
>>>>> with different owners...
>>>>> 
>>>>> Needless to say, the resulting implementation is very finicky, and 
>>>>> I'm
>>>>> worried about the overall performance model of this approach.
>>>>> 
>>>>> Also, I don't think that what I'm seeing is an artifact of lumping
>>>>> MemoryScope and MemorySegment together - yes, in principle having a
>>>>> separate scope (with a notion of confinement in it) helps in the 
>>>>> sense
>>>>> that resizing a segment becomes an orthogonal concern. But then you
>>>>> are back in a world where you can't give a different thread owner 
>>>>> to
>>>>> different sub-region, and the only way around that restriction is 
>>>>> to
>>>>> use memory copy (e.g. create a new segment and copy contents of the
>>>>> old one to the new).
>>>>> 
>>>>> If that cross-subregion policy is what we realistically want to
>>>>> enforce, then I don't think it's worth doing a lot of heroics here 
>>>>> -
>>>>> we can simply say that a segment is confined to a thread, there's 
>>>>> no
>>>>> ownership transfer operation, but the same effects can be achieved
>>>>> through memory copy. This doesn't seem quite a rich a story as the 
>>>>> one
>>>>> we were looking at - but if we were ok with Scope being in charge 
>>>>> of
>>>>> thread confinement, this would have been the only story possible.
>>>>> 
>>>>> So, the question becomes: do we really need a way to transfer
>>>>> ownership of a segment from thread A to thread B ? And if so, what
>>>>> granularity should be used? I think these are the possible answers:
>>>>> 
>>>>> a) ownership transfer not supported - region copy should be used as 
>>>>> a workaround
>>>>> b) ownership transfer supported; all subregion are constrained to 
>>>>> have
>>>>> same owner as the root; when ownership changes, all subregions 
>>>>> change
>>>>> ownership too
>>>>> c) ownership transfer supported; subregion ownership can set
>>>>> independently of the root
>>>>> 
>>>>> I realized that, in the email I've sent this morning I picked the 
>>>>> most
>>>>> difficult point in the design space (c) - that is, support 
>>>>> ownership
>>>>> transfers at the subregion granularity. This seems useful to 
>>>>> implement
>>>>> divide and conquer algorithms, but at the same time, I realized, 
>>>>> this
>>>>> was simply not possible with the scope-based solution we had before
>>>>> (since all subregions had same scope there - hence same 
>>>>> confinement).
>>>>> 
>>>>> In other words, all the implementation strategies we've seen so far
>>>>> are capable of handling either (a) or (b) [as for (b) I'm not sure
>>>>> about the potential JIT cost in making thread owner non-final]. The
>>>>> implementation story for (c) is far more convoluted (**), and I'm 
>>>>> very
>>>>> skeptical that, even if we can pull that off, it will perform in a 
>>>>> way
>>>>> that will be deemed acceptable.
>>>>> 
>>>>> Is (c) simply asking for too much? And, if so, is (b) something 
>>>>> that
>>>>> could be useful still?
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>> (**) Honestly, the overlapping region check seems the straw that
>>>>> breaks the camel's back - to implement the check it's sadly
>>>>> unavoidable to keep all subregions which share the same root in the
>>>>> same place - which then poses aforementioned problems with respect 
>>>>> to
>>>>> such subregions being GCed, and need for synchronization when
>>>>> maintaining all the ancillary lists. And, this overlapping region
>>>>> check is needed in both the approached (1) and (2) that I have
>>>>> outlined earlier in [1], I believe.
>>>>> 
>>>>> [1] - 
>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-June/005674.html