Separation between MemorySegment and MemoryScope
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Mar 29 21:16:39 UTC 2021
On 29/03/2021 21:52, Maurizio Cimadamore wrote:
>
> On 29/03/2021 20:16, Paul Sandoz wrote:
>> Our GC’s are getting better and better. The Java approach is a
>> developer should not, in general, have to manage memory: let the GC
>> do it. My view is if we can apply that default to Panama we all win
>> in the long run. I suspect it will make API design easier to when
>> building on top of Panama.
>>
>> Still, there will always be the need for more advanced use-cases, and
>> Panama caters for that too.
>>
>> I am still intrigued by whether there is any advantage to a GC being
>> taught that a small managed object refers to, and manages via
>> cleaner, a larger allocated region of native memory.
>
> While I don't necessarily disagree, in the world, as it is now, GC is
> not enough (which is the whole point for doing scopes in the first
> place).
>
> If we ever get to a place where GCs will be smarter, such no-arg
> overload can be added later, if needs be (not saying they should -
> just pointing out the possibility).
Also forgot about something: having explicit scope is in part about
deterministic deallocation, for sure - but, after looking at the recent
slew of fixes, I think it also makes the code more readable!
Yes, the Java way is that memory management is implicit - but when it
comes to interacting with native libraries, you see how many layers of
objects to worry about there are - which means that the chains which
keep your object (and your off heap memory) alive are all _implicit_ and
that, in the long run, simply doesn't scale.
In other words, with some moderate degree of efforts, we have been able
to make implicit segments to work ok with native libraries - and by that
I don't refer to performance - I simply refer to the fact that now these
segments are usable w/o getting odd crashes because of too-early
deallocation.
But I think we should not be under the illusion that such a scheme will
be enough, or that the "scoped" case will be for the 5% of the code out
there.
Many of the jextract examples we know of register callbacks against a
library (e.g. OpenGL). When you do that, you effectively require that
the callback remains there for how long the library needs it for.
Managing these use cases is far more natural if you can see and speak
about the lifecycle in your application - than it is by inserting
obscure reachability fences here and there, which I presume most
developers will get wrong. In modern VMs the concept of reachabilty is
more fluid than it used to me and it is easy to see the GC to kick in
and collect an object well before the source scope which introduced that
object has been closed in the corresponding source code. When writing
our tests we invariably hit one or two of these heisenbugs at least
once, so we know it's a real thing. On top of that, scalarization, the
advent of primitive classes might complicate things even further.
Then there's the other side of the problem: allocation. The GC can solve
the problem when it comes to deallocation (by detecting that a Java
object has a biggie non-Java-heap payload), but what about allocating
new native memory *fast* ? Well, I guess the GC could, in principle,
manage its pre-reserved region of native memory, and hand it out to
users, but that's on a completely different level, as it amount at
adding a new region of memory managed by the GC which has edges from
objects on the heap. This is, of course, neither impossible, not
completely far from the realm of possibilities - but I think we have to
be realistic about the expectations here: having a performant
deallocation story w/o having a performant allocation story is simply
not good enough.
So, all this to say: yes, GC might get better; and yes, GC might well,
one day, be able to work well enough that deterministic deallocation is
more or less on par with implicit deallocation.
But I think that, even then, I'd still prefer to write my OpenGL example
with a big TWR scope, using an allocator that works best for my use case :-)
Maurizio
>
> Maurizio
>
>>
>> Paul.
>>
>>> On Mar 29, 2021, at 2:47 AM, Maurizio Cimadamore
>>> <Maurizio.Cimadamore at Oracle.COM> wrote:
>>>
>>> Some of the consideration (see below) are correct John, but here the
>>> choice is not between the choice you picture - e.g. "safe-but-slow
>>> vs. fast-but-unsafe".
>>>
>>> If you construct a memory segment, w/ or w/o explicit parameters,
>>> you always get a safe memory segment, and actually, unless I'm
>>> mistaken, the speed won't really change by much (I believe they all
>>> perform the same now).
>>>
>>> The choice here is whether to make implicit GC allocation the
>>> default or not (the current API says yes, partly out of a desire to
>>> offer ByteBuffer users a relatively gentle ramp).
>>>
>>> But, I must stress, whether you chose one or the other _the segment
>>> you get back is still safe_.
>>>
>>>> So, make it explicit.
>>> I might agree with this, and what Remi proposed earlier, but for
>>> more pragmatic reasons: when using the API, I found that the current
>>> API makes it perhaps a little too easy for the user to "forget"
>>> about the scope argument - example:
>>>
>>> ```
>>> try (ResourceScope scope = ResourceScope.ofConfined()) {
>>> // ok I have a scope here
>>> MemorySegment segment = MemorySegment.allocateNative(100); // bug??
>>> ...
>>> }
>>> ```
>>>
>>> The call to MemorySegment::allocateNative is missing the scope
>>> parameter which, because of the overloads, is effectively an
>>> optional parameter. This might mean that developer might sometime
>>> get GC deallocation semantics "accidentally" (it happened to me on a
>>> couple of tests and jextract samples, so I have to conclude that
>>> this can happen to other folks too).
>>>
>>> To me, that is the most compelling argument in favor of removing the
>>> default: e.g. extra overload don't buy you that much (e.g. you can
>>> have a more aptly named `gcScope` factory which gives you the
>>> current default), and it introduces potential for mistakes. Since
>>> the cost for going explicit here is fairly low, I can see the argument.
>>>
>>> Maurizio
>>>
>>>
>>>
More information about the panama-dev
mailing list