segments and confinement
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri May 15 09:21:44 UTC 2020
On 15/05/2020 04:10, Samuel Audet wrote:
> Thanks for the summary!
>
> I was about to say that we can probably do funky stuff with
> thread-local storage, and not only with GC, but for example to prevent
> threads from trying to access addresses they must not access, but I
> see you've already started looking at that, at least for GC, so keep
> going. :)
For the records - one of the experiments I've tried (but not listed
here) was specifically by using ThreadLocal storage (to emulate some
kind of thread group concept) - but that also gave pretty poor results
performance-wise (not too far from locking) - which seems to suggest
that, if a solution exists (and this might not be _that_ obvious - after
all the ByteBuffer API has been struggling with this problem for many
many years) - it exists at a lower level.
>
> In any case, if the final solution could be applied to something else
> than memory segments that have to be allocated by the VM, then it
> would have great value for native interop. I hope it goes there.
The more we can make the segment lifetime general and shareable across
threads, the more we increase the likelihood of that happening.
Currently, segments have a fairly restricted lifetime handling (because
of confinement, which is because of safety) - and the same guarantees
don't seem useful (or outright harmful) when thinking about native
libraries and other resources (I don't think the concept of a confined
native library is very appealing).
So, IMHO, it all hinges on if and how we can make segments more general
and useful.
Maurizio
>
> Samuel
>
> On 5/13/20 8:51 PM, Maurizio Cimadamore wrote:
>> Hi,
>> this is an attempt to address some of the questions raised here [1],
>> in a dedicated thread. None of the info here is new and some of these
>> things have already been discussed, but it might be good to recap as
>> to where we are when it comes to memory segment and confinement.
>>
>> The foreign memory access API has three goals:
>>
>> * efficiency: access should be as fast as possible (hopefully close to
>> unsafe access)
>> * deterministic deallocation: the programmer have a say as to *when*
>> things should be deallocated
>> * safety: all memory accesses should never cause an hard VM crash
>> (e.g. because accessing memory out of bounds, or because accessing
>> memory that has been deallocated already
>>
>> Now, as long as memory segment are used by _one thread at a time_
>> (this pattern is also known as serial confinement), everything works
>> out nicely. In such a scenario, it is not possible for memory to be
>> accessed _while_ it is being deallocated. Memory segment spatial
>> bounds ensure that out-of-bound access is not possible, and the
>> memory segment liveness check ensures that memory cannot be accessed
>> _after_ it has been deallocated. All good.
>>
>> When we start considering situations where multiple threads want to
>> access the same segment at the same time, one of the pillars on which
>> safety relied goes away: namely, we can have races between a thread
>> accessing memory and a thread deallocating same memory (e.g. by
>> closing the segment it is associated with). In other words, safety,
>> one of the three pillars of the API, is undermined. What are the
>> solutions?
>>
>> *Locking*
>>
>> The first, obvious solution, would be to use some kind of locking
>> scheme so that, while memory is accessed, it cannot be closed.
>> Unfortunately, memory access is such a short-lived operation that the
>> cost of putting a lock acquire/release around it vastly exceed the
>> cost of the memory access itself. Furthermore, optimistic locking
>> strategies, while possible when reading, are not possible when
>> writing (e.g. you can still write to memory you are not supposed to).
>> So, unless we want memory access to be super slow (some benchmarks
>> revealed that, with best strategies, we are looking at at least 100x
>> cost over plain access), this is not a feasible solution.
>>
>> *Atomic reference counting*
>>
>> The solution implemented in Java SE 14 was based on atomic reference
>> counting - a MemorySegment can be "acquired" by another thread.
>> Closing the acquired view decrements the count. Safety is achieved by
>> enforcing an additional constraint: a segment cannot be closed if it
>> has pending acquired views. This scheme is relatively flexible, allow
>> for efficient, lock-free access, and it is still deterministic. But
>> the feedback we received was somewhat underwhelming - while access
>> was allowed to multiple threads, the close() operation was still only
>> allowed to the original segment owner. This restriction seemed to
>> defeat the purpose of the acquire scheme, at least in some cases.
>>
>> *Divide and conquer*
>>
>> In the API revamp which we hope to deliver for Java 15, the general
>> acquire mechanism will be replaced by a more targeted capability -
>> that to divide a segment into multiple chunks (using a spliterator)
>> and have multiple threads have a go at the non-overlapping slices.
>> This gives a somewhat simpler API, since now all segments are
>> similarly confined - and the fact that access to the slices occur
>> through the spliterator API makes the API somewhat more accessible,
>> removing the distinction between acquired segments and non-acquired
>> ones. This is also a more honest approach: indeed the acquire scheme
>> was really most useful to process the contents of a segment in
>> parallel - and this is something that the Spliterator API allows you
>> to do relatively well (plus, we gained automatic synergy with
>> parallel streams).
>>
>> *Unsafe hatch*
>>
>> The new MemorySegment::ofNativeRestricted factory allows creation of
>> memory segment without an explicit thread owner. Now, this factory is
>> meant to be used for unsafe use cases (e.g. those originating from
>> native interop), and clients of this API will have to provide
>> explicit opt-in (e.g. a command line flag) in order to use it ---
>> since improper uses of the segments derived from it can lead to hard
>> VM crashes. So, while this option is certainly powerful, it cannot be
>> considered a _safe_ option to deal with shared memory segments and,
>> at best, it merely provides a workaround for clients using other
>> existing unsafe API points (such as Unsafe::invokeCleaner).
>>
>> *GC to the rescue*
>>
>> What if we wanted a truly shared segment which could be accessed by
>> any thread w/o restrictions? Currently, the only way to do that is to
>> let the segment be GC-managed (as already happens with byte buffers);
>> this gives up one of the principle of the foreign memory access API:
>> deterministic deallocation. While this is a fine fallback solution,
>> this also inherits all the problems that are present in the
>> ByteBuffer implenentation: we will have to deal with cases where the
>> Cleaner doesn't deallocate segments fast enough (to partially counter
>> that, ByteBuffer implements a very complex scheme, which makes
>> ByteBuffer::allocateDirect very expensive); furthermore, all memory
>> accesses will need to be wrapped around reachability fences, since we
>> don't want the cleaner to kick in in the middle of memory access. If
>> all else fail (see below), this is of course something we'll consider
>> nevertheless.
>>
>> *Other (experimental) solutions*
>>
>> Other approaches we're considering are a variation of a scheme
>> proposed originally by Andrew Haley [2] which uses GC safepoints as a
>> way to prove that no thread is accessing memory when the close
>> operation happens. What we are investigating is as to whether the
>> cost of this solution (which would requite a stop-the-world pause)
>> can be ameliorated by using thread-local GC handshakes ([3]). If this
>> could be pulled off, that would of course provide the most natural
>> extension for the memory access API in the multi-threaded case:
>> safety and efficiency would be preserved, and a small price would be
>> paid in terms of the performances of the close() operation (which is
>> something we can live with).
>>
>> Another experimental solution we're considering is to relax the
>> confinement constraint so that more coarse-grained confinement units
>> can also be associated with segments. For instance, Loom is
>> considering the inclusion of an unbounded executor service [4], which
>> can be used to schedule fibers. What if we could create a memory
>> segment that is confined to one such executor service? This way, we
>> could achieve safety by having the close() operation wait until all
>> the threads (or fibers!) in the service have completed.
>>
>>
>> This should summarize where we're at pretty exhaustively. In other
>> words, no, we did not give up on multi-threaded access, but we need
>> to investigate more to understand what possibilities are available to
>> us, especially if we're willing to go lower level.
>>
>> Cheers
>> Maurizio
>>
>> [1] -
>> https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/008989.html
>> [2] - https://mail.openjdk.java.net/pipermail/jmm-dev/2017-January.txt
>> [3] - https://openjdk.java.net/jeps/312
>> [4] - https://github.com/openjdk/loom/commit/f21d6924
>>
More information about the panama-dev
mailing list