[foreign-memaccess] on shared segments

Thu Sep 26 20:07:09 UTC 2019

On 26/09/2019 19:20, Brian Goetz wrote:
> I think this approach balances the requirements cleanly.  It puts the 
> cost of concurrent access on the current use cases -- by requiring an 
> extra step to set up the shared buffer -- without perturbing the rest 
> of the API or the performance of the confined use cases.

Thanks.

Btw, forgot to mention - this opens up another round of "how should the 
asXYZ" methods be called. Before we had:

slice(long, long) -> segment with smaller bounds

asReadOnly() -> read only segment
asPinned() -> non closeabe segment

Now we also have

asConfined(Thread) -> make a new confined segment with different owner
asShared() -> make a new shared segment

I think slice, asReadOnly and asPinned can be seen as 'views' - that is, 
their temporal scope is the same as the segment they come from. 
asConfined and asShared are different beasts.

For now I used the asXYZ for everything, but I'm conscious that maybe a 
better naming scheme exists.

Also, there is a question of what asConfined and asShared should do in 
the case where no change is needed - that is, if you call asConfined(A) 
on a segment already owned by A - what happens? Similarly, if you call 
asShared() on an already shared segment, what happens? I think we have 
three choices:

1) give an error - seems harsh
2) return same segment - less harsh, but seems irregular - sometimes the 
segment is killed, sometimes is not
3) kill current segment and return new segment every time - seems also a 
bit harsh

(the patch I shared is a bit inconsistent in that it does (2) for 
asShared, but (3) for asConfined)

I don't have any particular strong preference for any of the choices, 
other than I kind of dislike (3). (2) seems reasonable overall. Opinions?

Maurizio

>
> On 9/26/2019 2:10 PM, Maurizio Cimadamore wrote:
>> Hi,
>> in a previous document [1] I explored the problem of allowing 
>> concurrent access to a memory segment in a safe fashion. From that 
>> exploration, it emerged that there was one type of race that was 
>> particularly nasty: that is, a race between a thread A attempting to 
>> close a segment S while a thread B is attempting to access (read or 
>> write) S.
>>
>> The presence of this race makes it really hard to generalize the 
>> existing memory access API to cases where concurrent/shared access is 
>> needed. Of course one naive solution would be to synchronize every 
>> access on the liveness check, but that makes performance really poor 
>> - which would defeat the point of having such an API in the first place.
>>
>> Instead, to solve that problem, in the document I posit about a 
>> solution which uses an explicit acquire/release mechanism - that is 
>> clients of a shared segment will need to explicitly acquire the 
>> segment in order to be able to operate on it, and release it when 
>> done. A shared segment can only be closed when all clients are done 
>> with the segment - this is what ensures temporal safety. Moreover, 
>> since each client works on its own 'acquired' copy of the shared 
>> segment, everything is a constant and the JIT can see through the 
>> code and optimize it in the same way as it does for confined access. 
>> That said, we never fully committed to that solution, since the 
>> resulting API was very complex: for things to work, part of the 
>> MemorySegment API has to be moved under a new abstraction (in the 
>> document called MemoryHandle) - more specifically the bits that are 
>> responsible for creating addresses. While it's possible to devise a 
>> confined segment that is both a MemorySegment and a MemoryHandle 
>> (thus giving us back the old API), the general feedback I've received 
>> is that this solution seems a bit too convoluted.
>>
>> When discussing about this problem with Jim, he pointed out a useful 
>> connection and a possible way out: after all, all these 
>> acquire/release and reference counting schemes are there to perform a 
>> job that a JVM knows exactly how to do at speed: determining whether 
>> an object is still used or not. So, instead of inventing new 
>> machinery, we could simply piggy back on the mechanisms we already 
>> have - that is GC and Cleaners.
>>
>> The key realization, in the shared case, can be summarized as: 
>> performance, safety, deterministic deallocation, pick two! Since 
>> we're not willing to compromise on safety, or on performance, letting 
>> go of the deterministic de-allocation goal (only for shared segments) 
>> seems a reasonable conclusion.
>>
>> In other words, there are now two kinds of segments: /confined/ 
>> segment and /shared/ segments. A segment always starts off as 
>> confined, and has an owning thread. You can update the owning thread 
>> - effectively nuking the existing segment and obtaining a new segment 
>> that is confined on a new thread. This allows clients to achieve 
>> serialized thread-confinement use cases - where multiple threads 
>> operate on a piece of memory one at a time. Confined segments are 
>> operated upon as usual: you allocate a segment, you use it, you close 
>> it (or you use a try with resources to do it all automagically).
>>
>> If clients want more - e.g. full concurrent access, an API point is 
>> provided to turn a confined segment into a shared one. Again, what 
>> happens here is that the existing segment will be nuked, and a new 
>> shared segment will be created. But, this shared segment _cannot be 
>> closed_ (e.g. it is pinned, using the existing API terminology). So, 
>> how are off-heap resources released if we can't close the segment? 
>> Well, we let the GC take care of it - by registering the segment on a 
>> Cleaner, and have the cleaner call some cleanup code once the segment 
>> is no longer referenced (in reality, things are a bit different, in 
>> the sense that what we really  key on is the _scope_ of a segment, 
>> which might be shared across multiple views, but the essence is the 
>> same). In other words, deallocation for shared segments works pretty 
>> much the same way deallocation of direct buffer work.
>>
>> With this move, we are able to retain the simplicity of the existing 
>> API, while also being able to support efficient and safe concurrent 
>> access.
>>
>> A webrev implementing this change is available here:
>>
>> http://cr.openjdk.java.net/~mcimadamore/panama/shared-segments_v2/
>>
>> Implementation-wise things are, I think, quite straightforward. I 
>> took sometime to refactor the code, to make the various scope 
>> subclasses disappear. We now have a single memory segment 
>> implementation and two scopes: shared and confined. The confined 
>> scope takes a 'Runnable' cleanup action which is used (i) when 
>> closing the confined segment or (ii) passed onto the Cleaner by the 
>> shared scope if the segment is upgraded to 'shared' state. Also, 
>> since shared segment now can now be picked up by Cleaner when no 
>> longer referenced, it is crucial that we add in reachability fences 
>> around Unsafe operations (same way as direct buffer does really). 
>> This is because sometimes the GC can aggressively collect unused 
>> objects stored in local variables during method execution. Adding 
>> these fences doesn't negatively impact performances (in fact, I'm 
>> told these fences are a no-op in Hotspot).
>>
>> I also took some effort to update some of the javadoc which are 
>> rendered invalid by this change.
>>
>> Comments welcome
>>
>> Maurizio
>>
>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html
>>
>