[code-reflection] RFR: Model lifetimes of onnx session-related objects more explicitly

Adam Pocock duke at openjdk.org
Mon Mar 3 14:05:08 UTC 2025


On Mon, 3 Mar 2025 11:21:38 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > I'm a bit confused. Are we discussing to hide Onnx API from the Onnx API and expose the whole stuff as an extension of FFM API / Arena ?
> 
> I'm not sure either. But I think following the abstractions of the C API too literally might be a siren song. If we go down that path there's multiple layers of stuff to create, each with different lifetimes:
> 
>     * the onnx runtime, ortAPI -- which has a global lifetime
> 
>     * an onnx env which can be created/released
> 
>     * an onnx session, which can be created (inside an env) and released
> 
>     * tensors, which are OrtValues and can be created with any lifetime (but must be released with `ReleaseValue`)
> 
> 
> If we expose all this to developers mistakes will be likely (e.g. forgetting to close/release memory). The Python API went a completely different route -- and only seems to expose a session API:
> 
> https://onnxruntime.ai/docs/api/python/api_summary.html#
> 
> My general feeling is that 90% of what the user wants is some autocloseable concept in which temporary resources (e.g. tensors) can be allocated -- and in which loading/executing a model is possible (and when you close, then everything is deallocated). I believe for tesnors you probably want two options:
> 
>     * a way to create "global" tensors -- associated with their own automatic arena (so that they are freed based on GC)
> 
>     * a way to create "local" tensors -- tied to a particular session -- this is useful e.g. to model results of model execution
> 
> 
> I think our suggestions are in reality quite similar -- replace OnnxEnvironment with OnnxArena and you get the same thing. I'm not too biased on how we want to call these things -- as long as we don't introduce too much confusion with names in the ONNX API. IMHO what you call "OnnxEnvironment" is not really an OnnxEnv (e.g. an env cannot really execute anything) -- it's more a Java view of how interacting with the ONNX API should be -- so I'd prefer to call it with a separate name -- and then define how it maps on the Onnx API concepts.
> 
> If we go for a separate `OnnxEnvironment`-like class (e.g. something that is not an arena) then that class should also expose an arena, in case developers want to use the lifetime of the environment to perform memory segment allocation, memory map files, etc. (but maybe this is something we can add later, depending on use cases -- if you are confident that developers might never need to spell "MemorySegment" -- then it might not be necessary)

The ONNX Runtime Java API does explicitly model the lifetimes of those things aside from the OrtApi struct which is static and the environment which has a shutdown hook & has a no-op close method. We needed to do this as users want to be able to create environments using global thread pools which means they need to control environment creation, but the environment must be created at most once per process. The Python API is currently trying to figure out how to expose that and it'll be a bit annoying for them as their environment is implicit. There is an allocator concept too which I've ignored in the Java API but have had several requests to expose (it'll allow things like allocating directly onto the GPU, which is more useful when you can express GPU memory segments, not sure how to do that in the existing JNI based API), that maps more directly onto the arenas and I'd base the API around that if I was writing it today.

-------------

PR Comment: https://git.openjdk.org/babylon/pull/332#issuecomment-2694475543


More information about the babylon-dev mailing list