[foreign-memaccess] on shared segments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Sep 26 20:07:09 UTC 2019
On 26/09/2019 19:20, Brian Goetz wrote:
> I think this approach balances the requirements cleanly. It puts the
> cost of concurrent access on the current use cases -- by requiring an
> extra step to set up the shared buffer -- without perturbing the rest
> of the API or the performance of the confined use cases.
Thanks.
Btw, forgot to mention - this opens up another round of "how should the
asXYZ" methods be called. Before we had:
slice(long, long) -> segment with smaller bounds
asReadOnly() -> read only segment
asPinned() -> non closeabe segment
Now we also have
asConfined(Thread) -> make a new confined segment with different owner
asShared() -> make a new shared segment
I think slice, asReadOnly and asPinned can be seen as 'views' - that is,
their temporal scope is the same as the segment they come from.
asConfined and asShared are different beasts.
For now I used the asXYZ for everything, but I'm conscious that maybe a
better naming scheme exists.
Also, there is a question of what asConfined and asShared should do in
the case where no change is needed - that is, if you call asConfined(A)
on a segment already owned by A - what happens? Similarly, if you call
asShared() on an already shared segment, what happens? I think we have
three choices:
1) give an error - seems harsh
2) return same segment - less harsh, but seems irregular - sometimes the
segment is killed, sometimes is not
3) kill current segment and return new segment every time - seems also a
bit harsh
(the patch I shared is a bit inconsistent in that it does (2) for
asShared, but (3) for asConfined)
I don't have any particular strong preference for any of the choices,
other than I kind of dislike (3). (2) seems reasonable overall. Opinions?
Maurizio
>
> On 9/26/2019 2:10 PM, Maurizio Cimadamore wrote:
>> Hi,
>> in a previous document [1] I explored the problem of allowing
>> concurrent access to a memory segment in a safe fashion. From that
>> exploration, it emerged that there was one type of race that was
>> particularly nasty: that is, a race between a thread A attempting to
>> close a segment S while a thread B is attempting to access (read or
>> write) S.
>>
>> The presence of this race makes it really hard to generalize the
>> existing memory access API to cases where concurrent/shared access is
>> needed. Of course one naive solution would be to synchronize every
>> access on the liveness check, but that makes performance really poor
>> - which would defeat the point of having such an API in the first place.
>>
>> Instead, to solve that problem, in the document I posit about a
>> solution which uses an explicit acquire/release mechanism - that is
>> clients of a shared segment will need to explicitly acquire the
>> segment in order to be able to operate on it, and release it when
>> done. A shared segment can only be closed when all clients are done
>> with the segment - this is what ensures temporal safety. Moreover,
>> since each client works on its own 'acquired' copy of the shared
>> segment, everything is a constant and the JIT can see through the
>> code and optimize it in the same way as it does for confined access.
>> That said, we never fully committed to that solution, since the
>> resulting API was very complex: for things to work, part of the
>> MemorySegment API has to be moved under a new abstraction (in the
>> document called MemoryHandle) - more specifically the bits that are
>> responsible for creating addresses. While it's possible to devise a
>> confined segment that is both a MemorySegment and a MemoryHandle
>> (thus giving us back the old API), the general feedback I've received
>> is that this solution seems a bit too convoluted.
>>
>> When discussing about this problem with Jim, he pointed out a useful
>> connection and a possible way out: after all, all these
>> acquire/release and reference counting schemes are there to perform a
>> job that a JVM knows exactly how to do at speed: determining whether
>> an object is still used or not. So, instead of inventing new
>> machinery, we could simply piggy back on the mechanisms we already
>> have - that is GC and Cleaners.
>>
>> The key realization, in the shared case, can be summarized as:
>> performance, safety, deterministic deallocation, pick two! Since
>> we're not willing to compromise on safety, or on performance, letting
>> go of the deterministic de-allocation goal (only for shared segments)
>> seems a reasonable conclusion.
>>
>> In other words, there are now two kinds of segments: /confined/
>> segment and /shared/ segments. A segment always starts off as
>> confined, and has an owning thread. You can update the owning thread
>> - effectively nuking the existing segment and obtaining a new segment
>> that is confined on a new thread. This allows clients to achieve
>> serialized thread-confinement use cases - where multiple threads
>> operate on a piece of memory one at a time. Confined segments are
>> operated upon as usual: you allocate a segment, you use it, you close
>> it (or you use a try with resources to do it all automagically).
>>
>> If clients want more - e.g. full concurrent access, an API point is
>> provided to turn a confined segment into a shared one. Again, what
>> happens here is that the existing segment will be nuked, and a new
>> shared segment will be created. But, this shared segment _cannot be
>> closed_ (e.g. it is pinned, using the existing API terminology). So,
>> how are off-heap resources released if we can't close the segment?
>> Well, we let the GC take care of it - by registering the segment on a
>> Cleaner, and have the cleaner call some cleanup code once the segment
>> is no longer referenced (in reality, things are a bit different, in
>> the sense that what we really key on is the _scope_ of a segment,
>> which might be shared across multiple views, but the essence is the
>> same). In other words, deallocation for shared segments works pretty
>> much the same way deallocation of direct buffer work.
>>
>> With this move, we are able to retain the simplicity of the existing
>> API, while also being able to support efficient and safe concurrent
>> access.
>>
>> A webrev implementing this change is available here:
>>
>> http://cr.openjdk.java.net/~mcimadamore/panama/shared-segments_v2/
>>
>> Implementation-wise things are, I think, quite straightforward. I
>> took sometime to refactor the code, to make the various scope
>> subclasses disappear. We now have a single memory segment
>> implementation and two scopes: shared and confined. The confined
>> scope takes a 'Runnable' cleanup action which is used (i) when
>> closing the confined segment or (ii) passed onto the Cleaner by the
>> shared scope if the segment is upgraded to 'shared' state. Also,
>> since shared segment now can now be picked up by Cleaner when no
>> longer referenced, it is crucial that we add in reachability fences
>> around Unsafe operations (same way as direct buffer does really).
>> This is because sometimes the GC can aggressively collect unused
>> objects stored in local variables during method execution. Adding
>> these fences doesn't negatively impact performances (in fact, I'm
>> told these fences are a no-op in Hotspot).
>>
>> I also took some effort to update some of the javadoc which are
>> rendered invalid by this change.
>>
>> Comments welcome
>>
>> Maurizio
>>
>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html
>>
>
More information about the panama-dev
mailing list