[code-reflection] RFR: Model lifetimes of onnx session-related objects more explicitly

Mon Mar 3 09:47:07 UTC 2025

On Sun, 2 Mar 2025 15:20:40 GMT, Adam Sotona <asotona at openjdk.org> wrote:

>> The class representing an onnx session is auto closeable. But, in the current code, a session is closed immediately after its `run` method is called. This is problematic because a session returns some ORTValues (tensors) which also need to be freed, but that cannot be freed immediately after calling `run` (as they need to be used by clients).
>> 
>> To address this problem, I tweaked the session code to accept an external arena. All the allocation of session-related data structures now happens using that external arena. This means that the client can now be in charge of managing the lifetime of a session (see changes to MNIST demo).
>> 
>> To test, I tweaked the MNIST code to do 10K iterations on each button pressed. Predictably, a single button pressed resulted in over 3g of memory being leaked. With these changes the memory arrives at ~400K (there is still some minor leak, but not sure worth pushing more).
>> 
>> If the changes to the demo are not deemed good, I can withdraw this PR -- I mostly wanted to capture the result of my exploration somewhere.
>
> cr-examples/onnx/src/test/java/oracle/code/onnx/MNISTDemo.java line 151:
> 
>> 149:                     var imageTensor = Tensor.ofShape(new long[]{1, 1, IMAGE_SIZE, IMAGE_SIZE}, imageData);
>> 150: 
>> 151:                     try (Arena onnxSession = Arena.ofConfined()) {
> 
> Please don't call it a session. Onnx session is created below during the execution.

I can surely change the name -- but note that the MNISTDemo code does NOT create an onnx session explicitly -- not does it care whether a session or an env is constructed under the hood.

I believe the question here is what kind of high-level API would it make sense for babylon users to interact with -- which is a separate question from "how should Java bindings for ONNX look like". It seems we're going in a direction (not just in this PR, but in the last 2-3) where we keep hiding more and more of the ONNX stuff from the user facing code, and we just want to provide a simple abstraction to execute a model (expressed as a reflectable method).

> The same naming confusion is here. Below are created two sessions.

Well, this test creates an onnx session. The session constructor now takes an arena (the arena for the session) -- so the test now also creates an arena for that session and calls it "sessionArena".  I doesn't seem too confusing to me -- IMHO what you are really objecting to is the need of creating two things -- rather than the name?

-------------

PR Review Comment: https://git.openjdk.org/babylon/pull/332#discussion_r1977177840
PR Review Comment: https://git.openjdk.org/babylon/pull/332#discussion_r1977182923