[foreign-memaccess] on shared segments

Fri Sep 27 10:46:25 UTC 2019

2) seems fine to me. We can phrase the doc to be vague about whether a 
newly created segment will be returned, and just say we will return 'a 
segment with properties XYZ', then returning the same segment in some 
cases will just be an implementation detail.

1) definitely seems wrong. If 2 different use sites want to make sure 
the segment is confined to the current thread, and that just so happens 
to be the same thread for both use sites we get an error. In that case I 
feel like we'd need to add predicates to test whether a thread is 
already confined to a given thread, or shared (we have the latter in the 
prototype), so users can avoid these errors without having to use 
try/catch around different asXYZ calls.

3) in that same vain, also seems harsh since a user might have to 
redundantly call asConfined() or asShared() to make sure they can use 
the segment. It seems reasonable that this does not incur any unneeded 
overhead. What are the motivations for doing 3 over 2? other than living 
up to an expectation that a new segment would be returned (which I 
believe is a question of not making that promise in the spec/doc).

Reading through the code, there is one caveat that doesn't seem to be 
addressed yet; only the owning thread can call asShared() or 
asConfined() (not being checked currently), and, at least for 
asConfined(), I think the owning thread has to submit to a full fence 
before transferring the segment to make sure no accesses 'spill over' 
into the new state.

Jorn

On 26/09/2019 22:07, Maurizio Cimadamore wrote:
>
> On 26/09/2019 19:20, Brian Goetz wrote:
>> I think this approach balances the requirements cleanly.  It puts the 
>> cost of concurrent access on the current use cases -- by requiring an 
>> extra step to set up the shared buffer -- without perturbing the rest 
>> of the API or the performance of the confined use cases.
>
> Thanks.
>
> Btw, forgot to mention - this opens up another round of "how should 
> the asXYZ" methods be called. Before we had:
>
> slice(long, long) -> segment with smaller bounds
>
> asReadOnly() -> read only segment
> asPinned() -> non closeabe segment
>
> Now we also have
>
> asConfined(Thread) -> make a new confined segment with different owner
> asShared() -> make a new shared segment
>
>
> I think slice, asReadOnly and asPinned can be seen as 'views' - that 
> is, their temporal scope is the same as the segment they come from. 
> asConfined and asShared are different beasts.
>
> For now I used the asXYZ for everything, but I'm conscious that maybe 
> a better naming scheme exists.
>
> Also, there is a question of what asConfined and asShared should do in 
> the case where no change is needed - that is, if you call 
> asConfined(A) on a segment already owned by A - what happens? 
> Similarly, if you call asShared() on an already shared segment, what 
> happens? I think we have three choices:
>
> 1) give an error - seems harsh
> 2) return same segment - less harsh, but seems irregular - sometimes 
> the segment is killed, sometimes is not
> 3) kill current segment and return new segment every time - seems also 
> a bit harsh
>
> (the patch I shared is a bit inconsistent in that it does (2) for 
> asShared, but (3) for asConfined)
>
> I don't have any particular strong preference for any of the choices, 
> other than I kind of dislike (3). (2) seems reasonable overall. Opinions?
>
> Maurizio
>
>>
>> On 9/26/2019 2:10 PM, Maurizio Cimadamore wrote:
>>> Hi,
>>> in a previous document [1] I explored the problem of allowing 
>>> concurrent access to a memory segment in a safe fashion. From that 
>>> exploration, it emerged that there was one type of race that was 
>>> particularly nasty: that is, a race between a thread A attempting to 
>>> close a segment S while a thread B is attempting to access (read or 
>>> write) S.
>>>
>>> The presence of this race makes it really hard to generalize the 
>>> existing memory access API to cases where concurrent/shared access 
>>> is needed. Of course one naive solution would be to synchronize 
>>> every access on the liveness check, but that makes performance 
>>> really poor - which would defeat the point of having such an API in 
>>> the first place.
>>>
>>> Instead, to solve that problem, in the document I posit about a 
>>> solution which uses an explicit acquire/release mechanism - that is 
>>> clients of a shared segment will need to explicitly acquire the 
>>> segment in order to be able to operate on it, and release it when 
>>> done. A shared segment can only be closed when all clients are done 
>>> with the segment - this is what ensures temporal safety. Moreover, 
>>> since each client works on its own 'acquired' copy of the shared 
>>> segment, everything is a constant and the JIT can see through the 
>>> code and optimize it in the same way as it does for confined access. 
>>> That said, we never fully committed to that solution, since the 
>>> resulting API was very complex: for things to work, part of the 
>>> MemorySegment API has to be moved under a new abstraction (in the 
>>> document called MemoryHandle) - more specifically the bits that are 
>>> responsible for creating addresses. While it's possible to devise a 
>>> confined segment that is both a MemorySegment and a MemoryHandle 
>>> (thus giving us back the old API), the general feedback I've 
>>> received is that this solution seems a bit too convoluted.
>>>
>>> When discussing about this problem with Jim, he pointed out a useful 
>>> connection and a possible way out: after all, all these 
>>> acquire/release and reference counting schemes are there to perform 
>>> a job that a JVM knows exactly how to do at speed: determining 
>>> whether an object is still used or not. So, instead of inventing new 
>>> machinery, we could simply piggy back on the mechanisms we already 
>>> have - that is GC and Cleaners.
>>>
>>> The key realization, in the shared case, can be summarized as: 
>>> performance, safety, deterministic deallocation, pick two! Since 
>>> we're not willing to compromise on safety, or on performance, 
>>> letting go of the deterministic de-allocation goal (only for shared 
>>> segments) seems a reasonable conclusion.
>>>
>>> In other words, there are now two kinds of segments: /confined/ 
>>> segment and /shared/ segments. A segment always starts off as 
>>> confined, and has an owning thread. You can update the owning thread 
>>> - effectively nuking the existing segment and obtaining a new 
>>> segment that is confined on a new thread. This allows clients to 
>>> achieve serialized thread-confinement use cases - where multiple 
>>> threads operate on a piece of memory one at a time. Confined 
>>> segments are operated upon as usual: you allocate a segment, you use 
>>> it, you close it (or you use a try with resources to do it all 
>>> automagically).
>>>
>>> If clients want more - e.g. full concurrent access, an API point is 
>>> provided to turn a confined segment into a shared one. Again, what 
>>> happens here is that the existing segment will be nuked, and a new 
>>> shared segment will be created. But, this shared segment _cannot be 
>>> closed_ (e.g. it is pinned, using the existing API terminology). So, 
>>> how are off-heap resources released if we can't close the segment? 
>>> Well, we let the GC take care of it - by registering the segment on 
>>> a Cleaner, and have the cleaner call some cleanup code once the 
>>> segment is no longer referenced (in reality, things are a bit 
>>> different, in the sense that what we really  key on is the _scope_ 
>>> of a segment, which might be shared across multiple views, but the 
>>> essence is the same). In other words, deallocation for shared 
>>> segments works pretty much the same way deallocation of direct 
>>> buffer work.
>>>
>>> With this move, we are able to retain the simplicity of the existing 
>>> API, while also being able to support efficient and safe concurrent 
>>> access.
>>>
>>> A webrev implementing this change is available here:
>>>
>>> http://cr.openjdk.java.net/~mcimadamore/panama/shared-segments_v2/
>>>
>>> Implementation-wise things are, I think, quite straightforward. I 
>>> took sometime to refactor the code, to make the various scope 
>>> subclasses disappear. We now have a single memory segment 
>>> implementation and two scopes: shared and confined. The confined 
>>> scope takes a 'Runnable' cleanup action which is used (i) when 
>>> closing the confined segment or (ii) passed onto the Cleaner by the 
>>> shared scope if the segment is upgraded to 'shared' state. Also, 
>>> since shared segment now can now be picked up by Cleaner when no 
>>> longer referenced, it is crucial that we add in reachability fences 
>>> around Unsafe operations (same way as direct buffer does really). 
>>> This is because sometimes the GC can aggressively collect unused 
>>> objects stored in local variables during method execution. Adding 
>>> these fences doesn't negatively impact performances (in fact, I'm 
>>> told these fences are a no-op in Hotspot).
>>>
>>> I also took some effort to update some of the javadoc which are 
>>> rendered invalid by this change.
>>>
>>> Comments welcome
>>>
>>> Maurizio
>>>
>>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html
>>>
>>