segments and confinement
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed May 13 11:51:35 UTC 2020
Hi,
this is an attempt to address some of the questions raised here [1], in
a dedicated thread. None of the info here is new and some of these
things have already been discussed, but it might be good to recap as to
where we are when it comes to memory segment and confinement.
The foreign memory access API has three goals:
* efficiency: access should be as fast as possible (hopefully close to
unsafe access)
* deterministic deallocation: the programmer have a say as to *when*
things should be deallocated
* safety: all memory accesses should never cause an hard VM crash
(e.g. because accessing memory out of bounds, or because accessing
memory that has been deallocated already
Now, as long as memory segment are used by _one thread at a time_ (this
pattern is also known as serial confinement), everything works out
nicely. In such a scenario, it is not possible for memory to be accessed
_while_ it is being deallocated. Memory segment spatial bounds ensure
that out-of-bound access is not possible, and the memory segment
liveness check ensures that memory cannot be accessed _after_ it has
been deallocated. All good.
When we start considering situations where multiple threads want to
access the same segment at the same time, one of the pillars on which
safety relied goes away: namely, we can have races between a thread
accessing memory and a thread deallocating same memory (e.g. by closing
the segment it is associated with). In other words, safety, one of the
three pillars of the API, is undermined. What are the solutions?
*Locking*
The first, obvious solution, would be to use some kind of locking scheme
so that, while memory is accessed, it cannot be closed. Unfortunately,
memory access is such a short-lived operation that the cost of putting a
lock acquire/release around it vastly exceed the cost of the memory
access itself. Furthermore, optimistic locking strategies, while
possible when reading, are not possible when writing (e.g. you can still
write to memory you are not supposed to). So, unless we want memory
access to be super slow (some benchmarks revealed that, with best
strategies, we are looking at at least 100x cost over plain access),
this is not a feasible solution.
*Atomic reference counting*
The solution implemented in Java SE 14 was based on atomic reference
counting - a MemorySegment can be "acquired" by another thread. Closing
the acquired view decrements the count. Safety is achieved by enforcing
an additional constraint: a segment cannot be closed if it has pending
acquired views. This scheme is relatively flexible, allow for efficient,
lock-free access, and it is still deterministic. But the feedback we
received was somewhat underwhelming - while access was allowed to
multiple threads, the close() operation was still only allowed to the
original segment owner. This restriction seemed to defeat the purpose of
the acquire scheme, at least in some cases.
*Divide and conquer*
In the API revamp which we hope to deliver for Java 15, the general
acquire mechanism will be replaced by a more targeted capability - that
to divide a segment into multiple chunks (using a spliterator) and have
multiple threads have a go at the non-overlapping slices. This gives a
somewhat simpler API, since now all segments are similarly confined -
and the fact that access to the slices occur through the spliterator API
makes the API somewhat more accessible, removing the distinction between
acquired segments and non-acquired ones. This is also a more honest
approach: indeed the acquire scheme was really most useful to process
the contents of a segment in parallel - and this is something that the
Spliterator API allows you to do relatively well (plus, we gained
automatic synergy with parallel streams).
*Unsafe hatch*
The new MemorySegment::ofNativeRestricted factory allows creation of
memory segment without an explicit thread owner. Now, this factory is
meant to be used for unsafe use cases (e.g. those originating from
native interop), and clients of this API will have to provide explicit
opt-in (e.g. a command line flag) in order to use it --- since improper
uses of the segments derived from it can lead to hard VM crashes. So,
while this option is certainly powerful, it cannot be considered a
_safe_ option to deal with shared memory segments and, at best, it
merely provides a workaround for clients using other existing unsafe API
points (such as Unsafe::invokeCleaner).
*GC to the rescue*
What if we wanted a truly shared segment which could be accessed by any
thread w/o restrictions? Currently, the only way to do that is to let
the segment be GC-managed (as already happens with byte buffers); this
gives up one of the principle of the foreign memory access API:
deterministic deallocation. While this is a fine fallback solution, this
also inherits all the problems that are present in the ByteBuffer
implenentation: we will have to deal with cases where the Cleaner
doesn't deallocate segments fast enough (to partially counter that,
ByteBuffer implements a very complex scheme, which makes
ByteBuffer::allocateDirect very expensive); furthermore, all memory
accesses will need to be wrapped around reachability fences, since we
don't want the cleaner to kick in in the middle of memory access. If all
else fail (see below), this is of course something we'll consider
nevertheless.
*Other (experimental) solutions*
Other approaches we're considering are a variation of a scheme proposed
originally by Andrew Haley [2] which uses GC safepoints as a way to
prove that no thread is accessing memory when the close operation
happens. What we are investigating is as to whether the cost of this
solution (which would requite a stop-the-world pause) can be ameliorated
by using thread-local GC handshakes ([3]). If this could be pulled off,
that would of course provide the most natural extension for the memory
access API in the multi-threaded case: safety and efficiency would be
preserved, and a small price would be paid in terms of the performances
of the close() operation (which is something we can live with).
Another experimental solution we're considering is to relax the
confinement constraint so that more coarse-grained confinement units can
also be associated with segments. For instance, Loom is considering the
inclusion of an unbounded executor service [4], which can be used to
schedule fibers. What if we could create a memory segment that is
confined to one such executor service? This way, we could achieve safety
by having the close() operation wait until all the threads (or fibers!)
in the service have completed.
This should summarize where we're at pretty exhaustively. In other
words, no, we did not give up on multi-threaded access, but we need to
investigate more to understand what possibilities are available to us,
especially if we're willing to go lower level.
Cheers
Maurizio
[1] -
https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/008989.html
[2] - https://mail.openjdk.java.net/pipermail/jmm-dev/2017-January.txt
[3] - https://openjdk.java.net/jeps/312
[4] - https://github.com/openjdk/loom/commit/f21d6924
More information about the panama-dev
mailing list