[foreign-memaccess] on shared segments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Sep 26 18:10:22 UTC 2019
Hi,
in a previous document [1] I explored the problem of allowing concurrent
access to a memory segment in a safe fashion. From that exploration, it
emerged that there was one type of race that was particularly nasty:
that is, a race between a thread A attempting to close a segment S while
a thread B is attempting to access (read or write) S.
The presence of this race makes it really hard to generalize the
existing memory access API to cases where concurrent/shared access is
needed. Of course one naive solution would be to synchronize every
access on the liveness check, but that makes performance really poor -
which would defeat the point of having such an API in the first place.
Instead, to solve that problem, in the document I posit about a solution
which uses an explicit acquire/release mechanism - that is clients of a
shared segment will need to explicitly acquire the segment in order to
be able to operate on it, and release it when done. A shared segment can
only be closed when all clients are done with the segment - this is what
ensures temporal safety. Moreover, since each client works on its own
'acquired' copy of the shared segment, everything is a constant and the
JIT can see through the code and optimize it in the same way as it does
for confined access. That said, we never fully committed to that
solution, since the resulting API was very complex: for things to work,
part of the MemorySegment API has to be moved under a new abstraction
(in the document called MemoryHandle) - more specifically the bits that
are responsible for creating addresses. While it's possible to devise a
confined segment that is both a MemorySegment and a MemoryHandle (thus
giving us back the old API), the general feedback I've received is that
this solution seems a bit too convoluted.
When discussing about this problem with Jim, he pointed out a useful
connection and a possible way out: after all, all these acquire/release
and reference counting schemes are there to perform a job that a JVM
knows exactly how to do at speed: determining whether an object is still
used or not. So, instead of inventing new machinery, we could simply
piggy back on the mechanisms we already have - that is GC and Cleaners.
The key realization, in the shared case, can be summarized as:
performance, safety, deterministic deallocation, pick two! Since we're
not willing to compromise on safety, or on performance, letting go of
the deterministic de-allocation goal (only for shared segments) seems a
reasonable conclusion.
In other words, there are now two kinds of segments: /confined/ segment
and /shared/ segments. A segment always starts off as confined, and has
an owning thread. You can update the owning thread - effectively nuking
the existing segment and obtaining a new segment that is confined on a
new thread. This allows clients to achieve serialized thread-confinement
use cases - where multiple threads operate on a piece of memory one at a
time. Confined segments are operated upon as usual: you allocate a
segment, you use it, you close it (or you use a try with resources to do
it all automagically).
If clients want more - e.g. full concurrent access, an API point is
provided to turn a confined segment into a shared one. Again, what
happens here is that the existing segment will be nuked, and a new
shared segment will be created. But, this shared segment _cannot be
closed_ (e.g. it is pinned, using the existing API terminology). So, how
are off-heap resources released if we can't close the segment? Well, we
let the GC take care of it - by registering the segment on a Cleaner,
and have the cleaner call some cleanup code once the segment is no
longer referenced (in reality, things are a bit different, in the sense
that what we really key on is the _scope_ of a segment, which might be
shared across multiple views, but the essence is the same). In other
words, deallocation for shared segments works pretty much the same way
deallocation of direct buffer work.
With this move, we are able to retain the simplicity of the existing
API, while also being able to support efficient and safe concurrent access.
A webrev implementing this change is available here:
http://cr.openjdk.java.net/~mcimadamore/panama/shared-segments_v2/
Implementation-wise things are, I think, quite straightforward. I took
sometime to refactor the code, to make the various scope subclasses
disappear. We now have a single memory segment implementation and two
scopes: shared and confined. The confined scope takes a 'Runnable'
cleanup action which is used (i) when closing the confined segment or
(ii) passed onto the Cleaner by the shared scope if the segment is
upgraded to 'shared' state. Also, since shared segment now can now be
picked up by Cleaner when no longer referenced, it is crucial that we
add in reachability fences around Unsafe operations (same way as direct
buffer does really). This is because sometimes the GC can aggressively
collect unused objects stored in local variables during method
execution. Adding these fences doesn't negatively impact performances
(in fact, I'm told these fences are a no-op in Hotspot).
I also took some effort to update some of the javadoc which are rendered
invalid by this change.
Comments welcome
Maurizio
[1] - http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html
More information about the panama-dev
mailing list