some thoughts on panama/jextract

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Wed Jan 8 11:00:49 UTC 2020


On 08/01/2020 04:36, Michael Zucchi wrote:
> This reads effectively constant data into a temporary array - so 
> although the array is managed it's contents shouldn't be.  I couldn't 
> work out how to just copy the pointer value to "de-scope" it (and from 
> your earlier article about scopes/pointers it seems that is on 
> purpose).  The only solution seemed to be just to let the scope retain 
> the array for the lifetime of the result.  This type of query api is 
> everywhere so that isn't always practical. 

I agree that Pointer/Scope (and Scope inference in particular) didn't 
always work well with certain native library idioms such as the one you 
point out (we found other related issues, with library-filled arrays of 
callbacks).

In the low-level world, the approach we take for safety is slightly 
different - each memory segment (a region of memory) has its own spatial 
bounds (e.g. min/max address) and temporal bounds (is it still alive?). 
If segments come from Java code (as in, you have created them), then all 
is good, as the runtime knows the spatial/temporal bounds associated 
with the segment. In this world, pointers (we call them addresses) are 
just offsets into a segment and inherit all the safety boundaries of the 
owning segment.

If you start interacting with native libraries, you quickly run into 
cases where the library gives you back a pointer, and obviously that 
pointer is not going to be 'managed', that is, the runtime has no idea 
of what are the spatial/temporal bounds of its associated segment. So, 
by default, the pointers you get back this way are non-dereferenceable 
(as the runtime cannot prove correctness). What do do? To bring back 
safety we appeal to a layered approach, described below.

First, it's worth noting that there's a lot you can do even with 
unmanaged, non-dereferenceable addresses - you can e.g. pass them to 
other native functions. So in case of opaque pointer idioms (which seems 
your case) this will work most of the times w/o any required changes on 
the developer part. But what if you really need to dereference what 
comes back from a native library? Here there are two cases.

In the happy case where the library is giving you back some address, and 
you happen to have some other segment that you own, which you know is 
going to contain that address (by construction), then you can perform a 
so called 'rebase' operation - that is, take the unmanaged address and 
reassociate it with the managed segment. If the segment indeed contains 
the address, what you get back is a new, managed address which you can 
use normally.

But not all cases are so rosy - there are also cases where a library 
gives you back a pointer to some struct which you are expected to 
dereference, and the pointer is into some area of memory managed by the 
library itself - and so it cannot be safely rebased (since you have no 
segment to rebase it to). In this case, we make available an 'unsafe' 
functionality to create a segment out of thin air, with desired starting 
address and bounds. You can then use this fresh (but unsafe) segment to 
rebase your unmanaged address, and obtain an address that is trusted by 
fiat.

It is very likely that the minimal jextract tool we will offer will 
expose at least a knob to let you decide whether you want to generate 
'safe' bindings (where unmanaged pointers are unreferenceable) or 
'unsafe' ones (where unmanaged pointers are automatically rebased to 
fake segments whose bounds are very lax, so as to allow as much access 
as you want). In our experience, when porting the libclang API using 
this strategy, we only had to use the unsafe rebase operation twice on a 
very large corpus of code - so the 'safe' translation works well. Of 
course we know that the degree of success will depend on the library you 
are interacting with - and for some particularly ill-behaved library, 
the onus required to rebase all unmanaged addresses might be enough to 
switch over to the 'unsafe' extraction mode.

I hope this answers (at least in part) some of your concerns with the 
previous API, and that it helps you understand how the new, lower level 
API will work in that direction.

Thanks
Maurizio







More information about the panama-dev mailing list