[foreign-memaccess] on shared segments

Thu Sep 26 18:10:22 UTC 2019

Hi,
in a previous document [1] I explored the problem of allowing concurrent 
access to a memory segment in a safe fashion. From that exploration, it 
emerged that there was one type of race that was particularly nasty: 
that is, a race between a thread A attempting to close a segment S while 
a thread B is attempting to access (read or write) S.

The presence of this race makes it really hard to generalize the 
existing memory access API to cases where concurrent/shared access is 
needed. Of course one naive solution would be to synchronize every 
access on the liveness check, but that makes performance really poor - 
which would defeat the point of having such an API in the first place.

Instead, to solve that problem, in the document I posit about a solution 
which uses an explicit acquire/release mechanism - that is clients of a 
shared segment will need to explicitly acquire the segment in order to 
be able to operate on it, and release it when done. A shared segment can 
only be closed when all clients are done with the segment - this is what 
ensures temporal safety. Moreover, since each client works on its own 
'acquired' copy of the shared segment, everything is a constant and the 
JIT can see through the code and optimize it in the same way as it does 
for confined access. That said, we never fully committed to that 
solution, since the resulting API was very complex: for things to work, 
part of the MemorySegment API has to be moved under a new abstraction 
(in the document called MemoryHandle) - more specifically the bits that 
are responsible for creating addresses. While it's possible to devise a 
confined segment that is both a MemorySegment and a MemoryHandle (thus 
giving us back the old API), the general feedback I've received is that 
this solution seems a bit too convoluted.

When discussing about this problem with Jim, he pointed out a useful 
connection and a possible way out: after all, all these acquire/release 
and reference counting schemes are there to perform a job that a JVM 
knows exactly how to do at speed: determining whether an object is still 
used or not. So, instead of inventing new machinery, we could simply 
piggy back on the mechanisms we already have - that is GC and Cleaners.

The key realization, in the shared case, can be summarized as: 
performance, safety, deterministic deallocation, pick two! Since we're 
not willing to compromise on safety, or on performance, letting go of 
the deterministic de-allocation goal (only for shared segments) seems a 
reasonable conclusion.

In other words, there are now two kinds of segments: /confined/ segment 
and /shared/ segments. A segment always starts off as confined, and has 
an owning thread. You can update the owning thread - effectively nuking 
the existing segment and obtaining a new segment that is confined on a 
new thread. This allows clients to achieve serialized thread-confinement 
use cases - where multiple threads operate on a piece of memory one at a 
time. Confined segments are operated upon as usual: you allocate a 
segment, you use it, you close it (or you use a try with resources to do 
it all automagically).

If clients want more - e.g. full concurrent access, an API point is 
provided to turn a confined segment into a shared one. Again, what 
happens here is that the existing segment will be nuked, and a new 
shared segment will be created. But, this shared segment _cannot be 
closed_ (e.g. it is pinned, using the existing API terminology). So, how 
are off-heap resources released if we can't close the segment? Well, we 
let the GC take care of it - by registering the segment on a Cleaner, 
and have the cleaner call some cleanup code once the segment is no 
longer referenced (in reality, things are a bit different, in the sense 
that what we really  key on is the _scope_ of a segment, which might be 
shared across multiple views, but the essence is the same). In other 
words, deallocation for shared segments works pretty much the same way 
deallocation of direct buffer work.

With this move, we are able to retain the simplicity of the existing 
API, while also being able to support efficient and safe concurrent access.

A webrev implementing this change is available here:

http://cr.openjdk.java.net/~mcimadamore/panama/shared-segments_v2/

Implementation-wise things are, I think, quite straightforward. I took 
sometime to refactor the code, to make the various scope subclasses 
disappear. We now have a single memory segment implementation and two 
scopes: shared and confined. The confined scope takes a 'Runnable' 
cleanup action which is used (i) when closing the confined segment or 
(ii) passed onto the Cleaner by the shared scope if the segment is 
upgraded to 'shared' state. Also, since shared segment now can now be 
picked up by Cleaner when no longer referenced, it is crucial that we 
add in reachability fences around Unsafe operations (same way as direct 
buffer does really). This is because sometimes the GC can aggressively 
collect unused objects stored in local variables during method 
execution. Adding these fences doesn't negatively impact performances 
(in fact, I'm told these fences are a no-op in Hotspot).

I also took some effort to update some of the javadoc which are rendered 
invalid by this change.

Comments welcome

Maurizio

[1] - http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html