resource scopes and close actions

Sat Feb 5 09:16:33 UTC 2022

On 3/2/22 22:01, Maurizio Cimadamore wrote:
>
>> So it's only a minor inconvenience - it needs to keep another copy of 
>> the MemoryAddress (ps) around, and it needs to have a static 'close' 
>> function that takes a MemoryAddress rather than a static or 
>> objectfunction that uses 'segment'.  The former is just a wasteful 
>> copy and the latter pollutes the api a bit.  ffmpeg 
>> avformat_close_input actually takes an AVFormatContext ** and so can 
>> be invoked multiple times safely, so another explicit close function 
>> would be useful as well but that's another matter.
>>
> So, this is the main factory for AVFormatContext, and it takes a scope 
> - that seems to make sense to me. When the user closes the scope, some 
> native function will be called to release the memory that has been 
> allocated so far. I'm not sure I see the issues you mention though: 
> with respect to duplication, you refer to the fact that you have to 
> keep a MemoryAddress (local!) variable around? Not to dismiss the 
> concern, but that doesn't seem _that_ terrible.
>
Well it's only semantically a local, physically it's copied to the 
lambda handle.  Like i said it's only a minor thing.

> As for the close(MemoryAddress), yes, something like that would be 
> needed, but why is it a public method? Couldn't it just be some 
> private method somewhere? I guess I'm a bit fuzzy as to where your API 
> boundaries are. It seems to me that you want your clients to deal in 
> AVFormatContext, AVRational and friends, and you don't really want 
> them to think (too much) about scope, segments and addresses - which I 
> think is a fine move. But if that's the case, why having a 
> close(MemoryAddress) in the API?
>

Well the situation is a bit messier than i thought, as below.

>> Anyway i'm still trying to work out how to fit this all in, i'm 
>> passing around scopes because MemorySegment.ofAddress() needs it and 
>> now i'm trying to work out how to fit them to the lifecycle of 
>> objects.   This was the first attempt at utilising the facilities for 
>> some benefit.
> I think having at least one scope does make sense. After all it is 
> possible for a client to create an AVFormatContext, get an AVStream 
> out of it and keep hold of it for too long (e.g. after the 
> AVFormatContext has been released). In which case, if the client tries 
> to access that AVStream you will have a JVM crash. If, on the other 
> hand, the AVStream is backed by the same scope as the AVFormatContext, 
> clients can't cause crashes in this particular way: once the 
> AVFormatContext has been closed, nothing associated with its scope 
> will be accessible. Which seems like an important invariant in your 
> API? Also, I'm not 100% clear on how you handle writes; if the user 
> sets a custom AVStream on your array, clearly you can have a problem 
> if the lifecycle of the custom AVStream doesn't match that of the 
> enclosing AVFormatContext (which seems to be another reason to keep 
> track of lifecycles?)

Hmm yes I see the utility there, without a scope-thing you lose that 
downstream tracking although it also means you have to be all in on it 
for it to be effective.  In this specific case possibly AVFormatContext 
should have it's own shared scope and an explicit close function so it 
could close that scope at the right time, rather than going the other 
way around.

Sorry I got that part about writing streams completely wrong as it's 
been a long time since i looked at it.  It's a bit more complicated as 
the close function  you use depends on the constructor and potentially 
later calls.

reading streams:
   avformat_open_input() allocates/initialises struct AVFormatContext
   you can read and modify some of the structures it creates
   avformat_close_input() closes it.

or also reading streams (e.g. to provide custom i/o):
   avformat_alloc_context()
   modify some fields
   avformat_open_input()
   read/modify things
   avformat_close_input()

writing:
   avformat_alloc_contex()
   modify some fields
   avformat_new_stream()
   lookup/modify some fields
   avformat_write_header()
   avformat_free()

So 'auto-closing' isn't quite straightforward anyway, not without some 
higher level calls or state tracking.  In the JNI version I used the 
'opaque' user-data field in AVFormatContext to track which close 
function should be used.  Same approach could be done here I guess, 
although it would need a copy of the MemoryAddress to be saved in any 
constructors so the close function could access it in an onClose handler.

>>
>> But it seems very difficult to get right and somewhat clumsy to use 
>> and only provides a very marginal increase in live-ness safety 
>> because you can always turn a MemoryAddress into a MemorySegment in 
>> another scope.  And ideally it's not something you'd expose to the 
>> api user.  With JNI there were other more straightforward mechanisms 
>> to manage the java<>c lifecycle matching (e.g. nulling out the 
>> pointer), ones that could be hidden from the library user.
> It is true that if you have a MemoryAddress you can always turn it 
> into something - but here you are writing an high-level API - again, 
> I'm not 100% sure as to what the boundaries of your API are. If your 
> API wraps segments and addresses, then a user, ideally, will not be 
> able to circumvent scopes by creating a segment backed by same address 
> but a different scope. If, on the other hand, your struct abstractions 
> also expose segments and addresses, there is a chance for users to 
> break away from scope restrictions, but they will always need a 
> restricted operation to do so. So, assuming your API is the only 
> "privileged" part running in your system, the user can't mess with 
> scopes, even if it can sees the addresses.

Oh right, well I didn't really think about hiding them completely from 
the api, I suppose that makes sense and maybe in some cases it will be 
possible.  In other cases they are just handles, so even if they are 
hidden they are still represented by a MemoryAddress which are all in 
the global scope so they can't go out of scope even if they are closed - 
which will still potentially crash the jvm if they are passed to 
function calls.  See a couple of paragraphs below about those.

I don't really see what practical difference the restricted functions 
make for a library - it's up to the application author and ultimately 
the user as to which modules have native access and not the library.

MemorySegment and maybe MemoryAddress will have to be exposed for some 
of the data structures for use and interoperation with other libraries.  
Specifically for ffmpeg there's at least AVFrame { uint8_t *data[8]; }, 
which are memory buffers for pixels and samples (and could be any 
primitive type not just ubyte) and the main input/output one is 
interested in.  While the jni version has some helpers to handle 
conversion to/from java2d/javafx/java audio, it exposes a jni-allocated 
ByteBuffer for other cases like opencl where you want to avoid big 
copies.  It's also got ref/unref of the data fields and some functions 
will implicitly unref, so that might be tricky to get to work within the 
confines of resource scopes. Possibly a case where it just doesn't 
really fit the mechanism.

OpenCL (& Vulkan) does everything via handles (typed anonymous 
pointers), so presumably represented by MemoryAddress.  What approach to 
use here to make them scope-safe?  Have explicit scope tests before 
using them in calls?

If one wanted a scope-protected MemoryAddress this seems to be the only 
solution in the present api:

CLCommandQueue {
   ResourceScope scope;
   MemoryAddress handle;

   void enqueueNDRangeKernel(...) {
     try (ResourceScope scope = ResourceScope.newConfinedScope()) {
        scope.keepAlive(this.scope);
          enqueueNDRangeKernel$MH.invokeExact((Addressable)handle, ...);
     } catch (IllegalStateException ex) {
     }
   }
}

Actually it's a bit worse because there will be handles (possibly in 
different scopes) in the argument list and they would all need to be 
checked in the same way.

So for practicality they would have to be MemorySegment (ideally 
length=0) since they are always checked?

OpenCL adds another wrinkle here, it's possible to use generic getters 
to retrieve handles to other objects, like retrieving the 
cl_command_queue a given cl_event was written to.  In the jni bindings I 
did for opencl I used a global hash table to resolve handles to unique 
java objects, that seems the only approach but it seems heavy.  Side 
note: MemoryAddress.hashCode() is very poor for hash tables since most 
addresses have at least the 3 least significant bits set to 0 on a 
64-bit machine.

  !Z