[foreign-memaccess] scopes and thread confinement

Mon May 20 22:45:13 UTC 2019

Hi,
the MemoryScope abstraction is used to carry out 'liveness' checks - 
that is, to make sure that dereference operation on MemoryAddress occur 
while the address still points to a valid region.

To be able to get to the same level of performances as Unsafe::get/put, 
we need to be able to hoist the liveness check away in the JIT; but 
there's a problem here: even if the JIT can see through a bunch of code 
using addresses allocated by a given scope, it has to conservatively 
assume that another thread might chime in, and close the scope behind 
our back.

This makes it impossible for the JIT to completely optimize away the 
check. For this reason, in my document [1] I posited the existence of a 
'confined' scope, whose existence is bound to a given thread (and maybe, 
in the future, a fiber). I've been thinking a lot lately on how we'd 
like to expose this confinement model - I have arrived to some 
conclusions, but I still have some question marks, which I'll try to 
explain here.

First, I think it's important to realize that there are two aspects to 
confinement:

1) access to scope critical operation, such as fork, allocate and 
terminal operations such as close/merge
2) read/write access to the underlying memory allocated using a scope

While both aspects are important, they are not _equally so_. So much so 
that I'd like here to entertain the possibility that we assign each 
scope a thread (or fiber) owner, which is then used to confine access to 
critical scope operations, as in (1). That is - let's treat the critical 
scope operation as confined - to make sure that a scope can never be 
closed behind our back - but let's leave the memory region open so that 
many threads can access it concurrently. After all, the VarHandle API 
offers very good atomic read/write CAS-like operations, which can be 
used to implement any kind of synchronization atop. (Of course we could 
also expose an opt-in scope charateristics which, additionally, confines 
reads and writes to the given owner thread, as in (2), but that aspect 
seems less important in the scope of this discussion).

If we pull the string on this model, we soon encounter a road-block: 
what is a global scope? While it seem reasonable for an explicitly 
forked scope to have an owner thread, how can a global scope have an 
owner, since it's created with the VM and dies with it? And, if the 
global scope had a fictional owner (the thread which created it) how 
could other threads do anything with the global scope?

Not surprisingly, this duality between confined and global scopes is 
also present in the FiberScope API (in project Loom); there, we have two 
kind of FiberScope implementations, one is called FiberScopeImpl [2] and 
is the default, effectively confined one (so, you will get an exception 
when performing operations from another strand). The other is called 
DetachedFiberScope and is shared and can be used by all threads - it can 
be thought as the root in the fiber scope ownership model.

This is all quite similar to what we're aiming to get at - but with one 
complication; in the case of FiberScope, having a shared scope 
implementation is not really that problematic, as a detached scope 
implementation doesn't have any mutable state and doesn't require any 
kind of synchronization. But in the case of Panama scopes, there are 
many things that can go wrong:

* a scope keeps a list of all the allocated memory blocks, for reused - 
if accessed concurrently the scope can get corrupted
* a scope keeps a list of all the descendants - again, if a scope is 
forked concurrently bad things will happen here

The good thing is that a global memory scope cannot be closed (like a 
detached scope in Loom), so we don't have to worry about threads 
concurrently calling close().

Now, can we imagine a global scope implementation that require no 
synchronization? I think that could be doable:

* instead of going for a sophisticated allocation scheme which minimizes 
the calls to Unsafe, we could just call Unsafe once for every call to 
MemoryScope::allocate, so that no shared state is used
* we can easily remove the descendant list, and have all children check 
liveness of their parent, recursively (rather than having the parent 
closing sub-scopes recursively)

Then the question is - what happens to resources allocated inside a 
global scopes (or resources merged _into_ it) ? I see two options here:

1) we could install an automagic memory collector - which frees memory 
as soon as the allocated region goes out of scope
2) we do nothing - if you allocate on the global scope no deallocation 
occurs - the region stays alive until the VM exits

I was initially leaning towards (1); I liked the idea of having _some_ 
deallocation strategy for globally allocated resources; but then I 
started to realize that this semantics difference between global scopes 
and regular scope was not without its own issues:

* what happens when a memory region is 'resized'? In that case we create 
a new region - but if the old one is then collected, the new region will 
just contain garbage!

* what happens when we merge into a global scope from a child scope? We 
go from a deterministic deallocation behavior to a reachability-based 
one; this could be very confusing for users!

* when working with scopes we can assume the resources of a parent will 
outlive those of the children - so that it is safe to copy pointers from 
the parent to the children (e.g. such pointers will be valid for as long 
as the children scope are alive). But if the parent is a global scope 
featuring the reachability-based deallocation described in (1), this 
assumption is no longer valid - if the region allocated in the parent is 
deemed unreachable, it can be collected, which means the parent region 
can become 'not alive' *before* the children scope is closed!!

So I'm starting to see the appeal of (2): a global scope is a scope that 
is always alive; if it's always alive that must mean that memory 
allocated inside it is never freed, and will outlive the memory regions 
allocated by any other scopes. This means that developers should use 
global scopes with care - knowing that stuff there will never really be 
deallocated (so it only really makes sense for 'global' memory regions). 
Same applies for merging a child scope into the global scope - the 
associated resources will stay alive forever.

All this seems to point at the following directions:

1) All forked scopes have an owner thread - fork/close/merge/allocate 
can only be called within that thread
2) Global scopes have no owner - all threads can call fork/allocate - 
close/merge are forbidden here
3) Underlying memory access is not restricted to owner thread - multiple 
thread can synchronize (e.g. with CAS)
3b) If we want to we can implement full confined memory access - e.g. 
allow memory access only within the boundaries of the owner thread

Comments?

Maurizio

[1] - http://cr.openjdk.java.net/~mcimadamore/panama/memaccess.html
[2] - 
https://hg.openjdk.java.net/loom/loom/file/cc783ba01af5/src/java.base/share/classes/java/lang/FiberScope.java#l636
[3] - 
https://hg.openjdk.java.net/loom/loom/file/cc783ba01af5/src/java.base/share/classes/java/lang/FiberScope.java#l617