taming resource scopes

Wed Jun 2 08:16:52 UTC 2021

You can build an Arc (Atomic Reference Counted) like thing as a library,
but without deep language integration it will be easy for the contained
reference to escape the Arc.
Thankfully MemorySegments will still prevent you from accessing freed
memory.
An alternative to building a generic Arc is to build specialised atomically
reference counting wrappers for each thing you wish to have reference
counted, and then delegate calls instead of exposing the inner reference.
You still run into problems that, without language integration, object
references can be shared without any enforcement of counter adjustments.
This makes usage awkward and introduces a new class of bugs that can occur
in programs.
You might also run into an issue that your object can now be in one of
three states: owned, shared, and closed/freed/released – the distinction
between owned and shared is the new thing.
Objects that are in a shared state might not support all of the operations
of an owned object.
Rust tracks this information in the type system as part of the language.
I built a prototype buffer implementation where I tracked this information
at runtime, but discarded the idea because it is, again, awkward and error
prone.
There were many places where the code had to check and branch on the
owned/shared state, and failure to do so would cause exceptions at runtime.
So my point is that Rust and Swift have good reference counting stories
because of specific language features, and without similar language
features I don't think having a generic or general Arc class in Java will
be very helpful.

Cheers,
Chris

On Wed, 2 Jun 2021 at 03:04, Samuel Audet <samuel.audet at gmail.com> wrote:

> Let's see, I guess what I wanted to say with "concurrent" is "efficient
> lock-free" and "GC" is "tracing GC". I don't think Rust Arc provides the
> kind of general mechanism I have in mind, for example, here is an
> example trying to handle lazily initialized data, which models well
> resources external to the CPU such as GPUs, FPGAs, etc that need
> deterministic deallocation:
>
> "On my 2012 desktop workstation with 3GHz Xeon W3550, the Java benchmark
> reports an average of 7.3 ns per getTransformed invocation. The Rust
> benchmark reports 128 ns in get_transformed, a whopping 17 times slower
> execution."
> https://morestina.net/blog/784/exploring-lock-free-rust-3-crossbeam
>
> Unless things have changed dramatically in the last couple of years, I
> don't think Rust Arc offers an efficient and safe mechanism that can be
> used in all cases.
>
> I'm sorry to hear Java isn't going to try to provide something
> approaching Rust's Arc, but oh well, something is better than nothing.
> :) Keep up the good work
>
> Samuel
>
> On 5/31/21 8:57 PM, Maurizio Cimadamore wrote:
> >
> > On 30/05/2021 00:18, Samuel Audet wrote:
> >> Hi, Maurizio,
> >>
> >> Thanks for taking the time to write this down! It's all very
> >> interesting to have a sense of how various resources could be managed.
> >>
> >> This is starting to sound a lot like reference counting, and there are
> >> many implementations out there that use reference counting
> >> "automatically", most notably Swift. I'm not aware of any concurrent
> >> thread-safe implementation though, which would be nice if it can be
> >> achieved in general, but even if that works out, I'm assuming we could
> >> still end up with reference cycles. What's your thoughts on that
> >> subject? CPython deals with those with GC...
> > Hi Samuel,
> >
> > I believe Swift's Automatic Reference Counting (not be confused with
> > Rust's Atomic Reference Counting - the two tragically share the same
> > acronym :-)) is a form of garbage collection where, rather than having a
> > separate process grovelling through memory (like the JVM's GC does),
> > increments and decrements are generated (presumably by the compiler) in
> > the user code directly, thus achieving a lower footprint solution, which
> > might be good in certain situations. Of course we know the issues with
> > reference counting when used as a _general_ mechanism for garbage
> > collection - the main ones being the inability to deal with cycles, and
> > another being expensive at dealing with atomicity, as you say. By the
> > way, the latter _can_ be addressed: in fact, the other ARC, Rust's one,
> > does exactly that [1], and, in a way, what we do for (shared) resource
> > scopes is inspired by that work. It is just generally less efficient, as
> > it involves atomic operations.
> >
> > Now, when it comes to resource scopes, it is not our goal to come up
> > with a perfect and general garbage collection mechanism. If the users
> > wanted that, well, they could just use the GC itself (and use an
> > implicit, GC-backed scope). What we're after here is a mechanism which
> > provides a _reliable_ programming model in the face of deterministic
> > deallocation. The NIO async [2] use case shows the problem pretty
> clearly:
> >
> > * thread A initiates an async operation on a resource R
> > * at some point later, thread B picks up resource R and starts working
> >
> > In this case, you need to define a "bubble" which starts when thread A
> > submits the async operation, and finishes when thread B has finished
> > executing such operation. If R is released between (1) and (2) several
> > errors, with varying degree of gravity can occur - from an exception to
> > a VM crash, if the resource is released when the IO operation has
> > already been submitted to the OS.
> >
> > In the document, I note that native calls are not too different from the
> > async use case. Ideally, you'd like for all resources used by a native
> > call to remain alive until the native code completes. These kinds of
> > invariants have to be built _on top_ - classic JVM's garbage collection
> > cannot help when deterministic deallocation is involved in the picture.
> > And, while we can use GC-related techniques to speed up access to shared
> > segment w/o compromising safety, we can only do that if (a) access to a
> > resource is lexically enclosed (e.g. if you could write a try/finally
> > block around it - which e.g. you can't do in the async case, as it spans
> > across multiple threads) and (b) if we can make sure that the number of
> > the stack frames involved in the resource access is bounded (which is
> > not the case with native calls, as, with upcalls, the stack during a
> > native call can grow w/o bounds).
> >
> > I think it's also very interesting to notice that, even when working
> > with a _confined_ segment, you need some way to block deterministic
> > closure, otherwise you end up with issues in the following case:
> >
> > * thread A creates segment S
> > * thread A passes pointer to S to native code
> > * native code upcalls to some Java code
> > * Java code (again, in thread A) closes the scope to which S belongs
> > * when upcall completes, control returns to native call which attempts
> to dereference S*
> > * crash
> >
> > Here we only have access from one thread - and even that is not enough
> > to guarantee safety, as some accesses (those in native code) are
> > blissfully unaware of the liveness checks occurring in the Java code.
> >
> > For these reasons we need some way to define a "bubble" where close
> > operations are restricted. This is not a new concept, in fact the API
> > proposed for Java 17 already had a concept of acquire/release; the
> > document just describes a possible restacking where, instead of dealing
> > with acquire/release calls directly, clients set up temporal
> > dependencies between scopes (but under the hood the acquire/release
> > remains). The only way (I know) to avoid reference counting and still
> > get benefits of deterministic deallocation would be to track resource
> > usage at compile-time (e.g. memory ownership) - but, when calls to
> > foreign functions are involved, not even these more advances systems
> > would be enough.
> >
> > One last note: what we do is not, strictly speaking, reference counting
> > either :-) Reference counting is symmetric, at least in its classic
> > definition. The following works:
> >
> > ```
> > resource.inc();
> > // use resource
> > resource.dec();
> > ```
> >
> > But so does this:
> >
> > ```
> > resource.inc();
> > // use resource
> > resource.dec().dec().dec();
> > ```
> >
> > There is something wrong with the latter example, as a client is
> > attempting to decrement a counter which was incremented by some other
> > use of the resource. In our API, releasing a scope (or decrementing the
> > scope counter, if you will), can only be done by the very client that
> > did the acquire (or increment). This is what makes the API safe - a
> > plain reference counting mechanism (even if atomic) would have done
> > nothing for the NIO use case, for instance, as a client could still have
> > decremented the counter enough times so that a call to close() was
> > possible, thus defeating the very purpose of the reference counting.
> >
> > Maurizio
> >
> > [1] - https://doc.rust-lang.org/std/sync/struct.Arc.html
> > [2] - https://inside.java/2021/04/21/fma-and-nio-channels/
> >
>