taming resource scopes

Wed Jun 2 01:03:53 UTC 2021

Let's see, I guess what I wanted to say with "concurrent" is "efficient 
lock-free" and "GC" is "tracing GC". I don't think Rust Arc provides the 
kind of general mechanism I have in mind, for example, here is an 
example trying to handle lazily initialized data, which models well 
resources external to the CPU such as GPUs, FPGAs, etc that need 
deterministic deallocation:

"On my 2012 desktop workstation with 3GHz Xeon W3550, the Java benchmark 
reports an average of 7.3 ns per getTransformed invocation. The Rust 
benchmark reports 128 ns in get_transformed, a whopping 17 times slower 
execution."
https://morestina.net/blog/784/exploring-lock-free-rust-3-crossbeam

Unless things have changed dramatically in the last couple of years, I 
don't think Rust Arc offers an efficient and safe mechanism that can be 
used in all cases.

I'm sorry to hear Java isn't going to try to provide something 
approaching Rust's Arc, but oh well, something is better than nothing. 
:) Keep up the good work

Samuel

On 5/31/21 8:57 PM, Maurizio Cimadamore wrote:
> 
> On 30/05/2021 00:18, Samuel Audet wrote:
>> Hi, Maurizio,
>>
>> Thanks for taking the time to write this down! It's all very 
>> interesting to have a sense of how various resources could be managed.
>>
>> This is starting to sound a lot like reference counting, and there are 
>> many implementations out there that use reference counting 
>> "automatically", most notably Swift. I'm not aware of any concurrent 
>> thread-safe implementation though, which would be nice if it can be 
>> achieved in general, but even if that works out, I'm assuming we could 
>> still end up with reference cycles. What's your thoughts on that 
>> subject? CPython deals with those with GC...
> Hi Samuel,
> 
> I believe Swift's Automatic Reference Counting (not be confused with 
> Rust's Atomic Reference Counting - the two tragically share the same 
> acronym :-)) is a form of garbage collection where, rather than having a 
> separate process grovelling through memory (like the JVM's GC does), 
> increments and decrements are generated (presumably by the compiler) in 
> the user code directly, thus achieving a lower footprint solution, which 
> might be good in certain situations. Of course we know the issues with 
> reference counting when used as a _general_ mechanism for garbage 
> collection - the main ones being the inability to deal with cycles, and 
> another being expensive at dealing with atomicity, as you say. By the 
> way, the latter _can_ be addressed: in fact, the other ARC, Rust's one, 
> does exactly that [1], and, in a way, what we do for (shared) resource 
> scopes is inspired by that work. It is just generally less efficient, as 
> it involves atomic operations.
> 
> Now, when it comes to resource scopes, it is not our goal to come up 
> with a perfect and general garbage collection mechanism. If the users 
> wanted that, well, they could just use the GC itself (and use an 
> implicit, GC-backed scope). What we're after here is a mechanism which 
> provides a _reliable_ programming model in the face of deterministic 
> deallocation. The NIO async [2] use case shows the problem pretty clearly:
> 
> * thread A initiates an async operation on a resource R
> * at some point later, thread B picks up resource R and starts working
> 
> In this case, you need to define a "bubble" which starts when thread A 
> submits the async operation, and finishes when thread B has finished 
> executing such operation. If R is released between (1) and (2) several 
> errors, with varying degree of gravity can occur - from an exception to 
> a VM crash, if the resource is released when the IO operation has 
> already been submitted to the OS.
> 
> In the document, I note that native calls are not too different from the 
> async use case. Ideally, you'd like for all resources used by a native 
> call to remain alive until the native code completes. These kinds of 
> invariants have to be built _on top_ - classic JVM's garbage collection 
> cannot help when deterministic deallocation is involved in the picture. 
> And, while we can use GC-related techniques to speed up access to shared 
> segment w/o compromising safety, we can only do that if (a) access to a 
> resource is lexically enclosed (e.g. if you could write a try/finally 
> block around it - which e.g. you can't do in the async case, as it spans 
> across multiple threads) and (b) if we can make sure that the number of 
> the stack frames involved in the resource access is bounded (which is 
> not the case with native calls, as, with upcalls, the stack during a 
> native call can grow w/o bounds).
> 
> I think it's also very interesting to notice that, even when working 
> with a _confined_ segment, you need some way to block deterministic 
> closure, otherwise you end up with issues in the following case:
> 
> * thread A creates segment S
> * thread A passes pointer to S to native code
> * native code upcalls to some Java code
> * Java code (again, in thread A) closes the scope to which S belongs
> * when upcall completes, control returns to native call which attempts to dereference S*
> * crash
> 
> Here we only have access from one thread - and even that is not enough 
> to guarantee safety, as some accesses (those in native code) are 
> blissfully unaware of the liveness checks occurring in the Java code.
> 
> For these reasons we need some way to define a "bubble" where close 
> operations are restricted. This is not a new concept, in fact the API 
> proposed for Java 17 already had a concept of acquire/release; the 
> document just describes a possible restacking where, instead of dealing 
> with acquire/release calls directly, clients set up temporal 
> dependencies between scopes (but under the hood the acquire/release 
> remains). The only way (I know) to avoid reference counting and still 
> get benefits of deterministic deallocation would be to track resource 
> usage at compile-time (e.g. memory ownership) - but, when calls to 
> foreign functions are involved, not even these more advances systems 
> would be enough.
> 
> One last note: what we do is not, strictly speaking, reference counting 
> either :-) Reference counting is symmetric, at least in its classic 
> definition. The following works:
> 
> ```
> resource.inc();
> // use resource
> resource.dec();
> ```
> 
> But so does this:
> 
> ```
> resource.inc();
> // use resource
> resource.dec().dec().dec();
> ```
> 
> There is something wrong with the latter example, as a client is 
> attempting to decrement a counter which was incremented by some other 
> use of the resource. In our API, releasing a scope (or decrementing the 
> scope counter, if you will), can only be done by the very client that 
> did the acquire (or increment). This is what makes the API safe - a 
> plain reference counting mechanism (even if atomic) would have done 
> nothing for the NIO use case, for instance, as a client could still have 
> decremented the counter enough times so that a call to close() was 
> possible, thus defeating the very purpose of the reference counting.
> 
> Maurizio
> 
> [1] - https://doc.rust-lang.org/std/sync/struct.Arc.html
> [2] - https://inside.java/2021/04/21/fma-and-nio-channels/
>