[code-reflection] RFR: Model lifetimes of onnx session-related objects more explicitly

Sun Mar 2 15:04:02 UTC 2025

On Fri, 28 Feb 2025 12:42:24 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> The class representing an onnx session is auto closeable. But, in the current code, a session is closed immediately after its `run` method is called. This is problematic because a session returns some ORTValues (tensors) which also need to be freed, but that cannot be freed immediately after calling `run` (as they need to be used by clients).
> 
> To address this problem, I tweaked the session code to accept an external arena. All the allocation of session-related data structures now happens using that external arena. This means that the client can now be in charge of managing the lifetime of a session (see changes to MNIST demo).
> 
> To test, I tweaked the MNIST code to do 10K iterations on each button pressed. Predictably, a single button pressed resulted in over 3g of memory being leaked. With these changes the memory arrives at ~400K (there is still some minor leak, but not sure worth pushing more).
> 
> If the changes to the demo are not deemed good, I can withdraw this PR -- I mostly wanted to capture the result of my exploration somewhere.

> > I think you are on to something here. I am wondering if we can refame it like this:
> > ```
> > try (OnnxRuntime or = OnnxRuntime.of(MethodHandles.lookup(), Area.ofConfined()) {
> >   or.execute(() -> cnn(imageTensor))
> > }
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > ?
> > I realize there can be only one runtime instance of a native ONNX runtime but i don't think we should be beholden to that constraint regarding the scoping of executing one or more models and with tensor memory under same lookup and arena (and even execution mode). Maybe there is a better name than `OnnxRuntime` to avoid any confusion.
> 
> I think you are correct in that the high-level API we want doesn't really need to expose all this ONNX "cruft" (e.g. it seems neither high-level nor low-level enough). From the perspective of the Java client, there should be some kind of ONNX "session" (which might, or might not be related to the ONNX session struct in the C API) -- which has a user-defined lifetime. The only interesting method in this session object is "run" and when you "close", the lifetime ends, and all the resources are garbage collected.
> 
> Maybe we can even do something like:
> 
> ```java
> try (OnnxSession onnx = OnnxSession.of()) {
>   or.execute(lookup, () -> cnn(imageTensor))
> }
> ```
> 
> And hide the confined arena creation inside the API alltogether. (I'm not sure where `lookup` goes).

`() -> cnn(imageTensor)` produces a model, which is a session.

I would not recommend to call it a session nor a runtime if we want to create a layer between the mandatory singleton runtime and per-model sessions. 
If we want to match the Onnx API, it should be called OnnxEnvironment, it is pre-configured to run multiple sessions and it is not necessary to create a tensor.

-------------

PR Comment: https://git.openjdk.org/babylon/pull/332#issuecomment-2692770903