[foreign] RFR 8209497: Polish Resource API

Tue Aug 14 20:51:38 UTC 2018

On Aug 14, 2018, at 7:22 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> Callbacks are NOT resources (since you can have a callback backed by a Java lambda), but they can _optionally_ be linked to one.

Callbacks are not unique in this respect.  Any entity that can be copied
or mirrored between the Java heap and native storage works this way.

For example, a by-value struct return value (or a vector return value)
might be allocated in native memory so that it can be addressed by
native code, or it might be allocated on the Java heap (wrapped in
a long[] array or heap byte buffer).  If the identity is unimportant because
it is passed by-value, then the runtime can validly copy it back and
forth between the Java heap and native memory, as needed, and
even keep two copies (one for each location).  This usage pattern
is quite similar to callbacks.

Even the basic Pointer type is designed to allow pointers into
the Java heap as well as native memory (and other kinds of
memory too).  A Pointer backed by pure Java heap storage
will sometimes be useful as a cursor into Java-readable native
data structures, even though native code cannot directly
address those structures (unless they are mirrored in native
memory or the heap block is somehow temporarily pinned).

The underlying Unsafe API is designed to allow dynamically
mixed access between on-heap and off-heap; also such flexible
allocation patterns are supportable by Pointer and ByteBuffer.

My point here is that, on the Java/native boundary, Callback
is not unique, and we will have a variety of structures which
dynamically choose whether they are backed on Java heap,
native memory, both, or (with value types) neither.

The corollary is, I think, that the Resource API should somehow
provide for the heap-only state, rather than try to statically
exclude types from Resources that might have a heap-only
state.

The easiest way I can think of to encode such a state is to
create a distinguished Scope object which encodes this state.
The heap-scope would nominally own objects which can
"take care of themselves" (with help from the GC of course).
It wouldn't be able to enumerate them.  Handing off an object
to the heap-scope would require that object to mirror itself
on heap and discard any other scoped resources.  Natural
members of the heap-scope would include Callbacks wrapping
user lambdas, bit-images of structures stored only on the heap,
and cursors into heap-backed native data structures such as
packet images under construction.

Such a heap-scope corresponds approximately to another
kind of scope we've already talked about, a scope which holds
per-library resources, which is emptied only when the library
is unloaded.

A civilized API might also feature temporary scopes for sessions
of API use.  These would be shorter-lived than the library as a
whole, but not tied to any particular stack frame (try/finally block).

Any discussion of special-purpose scopes is incomplete unless
we also point out the null-scope, which absorbs resources and
never gives them back.  Once a resource is safely owned by the
null-scope, its backing storage can be freed at any point, either
immediately or (if we want a measure of debugging help) after
dangling pointers no longer need to be detected.

All of the above scopes can be characterized as non-stack
(non-auto or static) scopes.  Most of our demo examples involve
stack scopes which must be used with try/finally.  Those scopes
are thread-confined and therefore are much safer to use, as
long as dangling pointers are prevented.  As you have pointed
out, Maurizio, those stack scopes also have a natural inclusion
relation (older includes younger), allowing a safe handoff
within the thread from blocks used inside a method to an
enclosing scope that the method returns to.

A final point:  I mentioned mirroring and copying above as
important moves for by-value data.  C doesn't natively support
this (except for struct passing), preferring pointer-based
protocols which reduce the number of copies.  But I think
it's important to identify native data (in C or another schema)
which is inherently by-value, and to more aggressively
copy and mirror such values when there is an advantage.
Moving to an enclosing scope is sometimes more efficient if
we copy the payload bits from inside a larger block in the
child scope into a larger block in the parent scope.  Even
better, if the payload bits have an unknown lifetime, we
can copy them to the heap-scope and let the GC worry
about them.  These optimizations are only unlocked when
we can mark data elements as being by-value (and when
we have an accurate layout).  I think this is a goal to keep
an eye on.

— John