Separation between MemorySegment and MemoryScope

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Mar 29 21:16:39 UTC 2021


On 29/03/2021 21:52, Maurizio Cimadamore wrote:
>
> On 29/03/2021 20:16, Paul Sandoz wrote:
>> Our GC’s are getting better and better. The Java approach is a 
>> developer should not, in general, have to manage memory: let the GC 
>> do it. My view is if we can apply that default to Panama we all win 
>> in the long run. I suspect it will make API design easier to when 
>> building on top of Panama.
>>
>> Still, there will always be the need for more advanced use-cases, and 
>> Panama caters for that too.
>>
>> I am still intrigued by whether there is any advantage to a GC being 
>> taught that a small managed object refers to, and manages via 
>> cleaner, a larger allocated region of native memory.
>
> While I don't necessarily disagree, in the world, as it is now, GC is 
> not enough (which is the whole point for doing scopes in the first 
> place).
>
> If we ever get to a place where GCs will be smarter, such no-arg 
> overload can be added later, if needs be (not saying they should - 
> just pointing out the possibility).

Also forgot about something: having explicit scope is in part about 
deterministic deallocation, for sure - but, after looking at the recent 
slew of fixes, I think it also makes the code more readable!

Yes, the Java way is that memory management is implicit - but when it 
comes to interacting with native libraries, you see how many layers of 
objects to worry about there are - which means that the chains which 
keep your object (and your off heap memory) alive are all _implicit_ and 
that, in the long run, simply doesn't scale.

In other words, with some moderate degree of efforts, we have been able 
to make implicit segments to work ok with native libraries - and by that 
I don't refer to performance - I simply refer to the fact that now these 
segments are usable w/o getting odd crashes because of too-early 
deallocation.

But I think we should not be under the illusion that such a scheme will 
be enough, or that the "scoped" case will be for the 5% of the code out 
there.

Many of the jextract examples we know of register callbacks against a 
library (e.g. OpenGL). When you do that, you effectively require that 
the callback remains there for how long the library needs it for.

Managing these use cases is far more natural if you can see and speak 
about the lifecycle in your application - than it is by inserting 
obscure reachability fences here and there, which I presume most 
developers will get wrong. In modern VMs the concept of reachabilty is 
more fluid than it used to me and it is easy to see the GC to kick in 
and collect an object well before the source scope which introduced that 
object has been closed in the corresponding source code. When writing 
our tests we invariably hit one or two of these heisenbugs at least 
once, so we know it's a real thing. On top of that, scalarization, the 
advent of primitive classes might complicate things even further.

Then there's the other side of the problem: allocation. The GC can solve 
the problem when it comes to deallocation (by detecting that a Java 
object has a biggie non-Java-heap payload), but what about allocating 
new native memory *fast* ? Well, I guess the GC could, in principle, 
manage its pre-reserved region of native memory, and hand it out to 
users, but that's on a completely different level, as it amount at 
adding a new region of memory managed by the GC which has edges from 
objects on the heap. This is, of course, neither impossible, not 
completely far from the realm of possibilities - but I think we have to 
be realistic about the expectations here: having a performant 
deallocation story w/o having a performant allocation story is simply 
not good enough.

So, all this to say: yes, GC might get better; and yes, GC might well, 
one day, be able to work well enough that deterministic deallocation is 
more or less on par with implicit deallocation.

But I think that, even then, I'd still prefer to write my OpenGL example 
with a big TWR scope, using an allocator that works best for my use case :-)

Maurizio



>
> Maurizio
>
>>
>> Paul.
>>
>>> On Mar 29, 2021, at 2:47 AM, Maurizio Cimadamore 
>>> <Maurizio.Cimadamore at Oracle.COM> wrote:
>>>
>>> Some of the consideration (see below) are correct John, but here the 
>>> choice is not between the choice you picture - e.g. "safe-but-slow 
>>> vs. fast-but-unsafe".
>>>
>>> If you construct a memory segment, w/ or w/o explicit parameters, 
>>> you always get a safe memory segment, and actually, unless I'm 
>>> mistaken, the speed won't really change by much (I believe they all 
>>> perform the same now).
>>>
>>> The choice here is whether to make implicit GC allocation the 
>>> default or not (the current API says yes, partly out of a desire to 
>>> offer ByteBuffer users a relatively gentle ramp).
>>>
>>> But, I must stress, whether you chose one or the other _the segment 
>>> you get back is still safe_.
>>>
>>>> So, make it explicit.
>>> I might agree with this, and what Remi proposed earlier, but for 
>>> more pragmatic reasons: when using the API, I found that the current 
>>> API makes it perhaps a little too easy for the user to "forget" 
>>> about the scope argument - example:
>>>
>>> ```
>>> try (ResourceScope scope = ResourceScope.ofConfined()) {
>>>     // ok I have a scope here
>>>     MemorySegment segment = MemorySegment.allocateNative(100); // bug??
>>>     ...
>>> }
>>> ```
>>>
>>> The call to MemorySegment::allocateNative is missing the scope 
>>> parameter which, because of the overloads, is effectively an 
>>> optional parameter. This might mean that developer might sometime 
>>> get GC deallocation semantics "accidentally" (it happened to me on a 
>>> couple of tests and jextract samples, so I have to conclude that 
>>> this can happen to other folks too).
>>>
>>> To me, that is the most compelling argument in favor of removing the 
>>> default: e.g. extra overload don't buy you that much (e.g. you can 
>>> have a  more aptly named `gcScope` factory which gives you the 
>>> current default), and it introduces potential for mistakes. Since 
>>> the cost for going explicit here is fairly low, I can see the argument.
>>>
>>> Maurizio
>>>
>>>
>>>


More information about the panama-dev mailing list