[code-reflection] RFR: Model lifetimes of onnx session-related objects more explicitly

Adam Pocock duke at openjdk.org
Fri Feb 28 22:34:04 UTC 2025


On Fri, 28 Feb 2025 12:42:24 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> The class representing an onnx session is auto closeable. But, in the current code, a session is closed immediately after its `run` method is called. This is problematic because a session returns some ORTValues (tensors) which also need to be freed, but that cannot be freed immediately after calling `run` (as they need to be used by clients).
> 
> To address this problem, I tweaked the session code to accept an external arena. All the allocation of session-related data structures now happens using that external arena. This means that the client can now be in charge of managing the lifetime of a session (see changes to MNIST demo).
> 
> To test, I tweaked the MNIST code to do 10K iterations on each button pressed. Predictably, a single button pressed resulted in over 3g of memory being leaked. With these changes the memory arrives at ~400K (there is still some minor leak, but not sure worth pushing more).
> 
> If the changes to the demo are not deemed good, I can withdraw this PR -- I mostly wanted to capture the result of my exploration somewhere.

The session in the C API does expose other useful things which allow you to introspect over the size & shape of the inputs & outputs, along with any metadata the user put into the ONNX graph. For the babylon use case only the former seems relevant as there won't be any metadata from the protobuf.

Splitting the session from the environment is relevant, many ML pipelines contain multiple models (e.g. diffusion image generation systems are a few text embedding models, a diffusion model and a variational autoencoder which maps from the diffusion space into pixel space) and so there might need to be multiple sessions in flight at the same time. For an example of how the sessions work for a diffusion system you can see [this](https://github.com/oracle/sd4j).

-------------

PR Comment: https://git.openjdk.org/babylon/pull/332#issuecomment-2691656462


More information about the babylon-dev mailing list