some thoughts on panama/jextract
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Jan 8 11:00:49 UTC 2020
On 08/01/2020 04:36, Michael Zucchi wrote:
> This reads effectively constant data into a temporary array - so
> although the array is managed it's contents shouldn't be. I couldn't
> work out how to just copy the pointer value to "de-scope" it (and from
> your earlier article about scopes/pointers it seems that is on
> purpose). The only solution seemed to be just to let the scope retain
> the array for the lifetime of the result. This type of query api is
> everywhere so that isn't always practical.
I agree that Pointer/Scope (and Scope inference in particular) didn't
always work well with certain native library idioms such as the one you
point out (we found other related issues, with library-filled arrays of
callbacks).
In the low-level world, the approach we take for safety is slightly
different - each memory segment (a region of memory) has its own spatial
bounds (e.g. min/max address) and temporal bounds (is it still alive?).
If segments come from Java code (as in, you have created them), then all
is good, as the runtime knows the spatial/temporal bounds associated
with the segment. In this world, pointers (we call them addresses) are
just offsets into a segment and inherit all the safety boundaries of the
owning segment.
If you start interacting with native libraries, you quickly run into
cases where the library gives you back a pointer, and obviously that
pointer is not going to be 'managed', that is, the runtime has no idea
of what are the spatial/temporal bounds of its associated segment. So,
by default, the pointers you get back this way are non-dereferenceable
(as the runtime cannot prove correctness). What do do? To bring back
safety we appeal to a layered approach, described below.
First, it's worth noting that there's a lot you can do even with
unmanaged, non-dereferenceable addresses - you can e.g. pass them to
other native functions. So in case of opaque pointer idioms (which seems
your case) this will work most of the times w/o any required changes on
the developer part. But what if you really need to dereference what
comes back from a native library? Here there are two cases.
In the happy case where the library is giving you back some address, and
you happen to have some other segment that you own, which you know is
going to contain that address (by construction), then you can perform a
so called 'rebase' operation - that is, take the unmanaged address and
reassociate it with the managed segment. If the segment indeed contains
the address, what you get back is a new, managed address which you can
use normally.
But not all cases are so rosy - there are also cases where a library
gives you back a pointer to some struct which you are expected to
dereference, and the pointer is into some area of memory managed by the
library itself - and so it cannot be safely rebased (since you have no
segment to rebase it to). In this case, we make available an 'unsafe'
functionality to create a segment out of thin air, with desired starting
address and bounds. You can then use this fresh (but unsafe) segment to
rebase your unmanaged address, and obtain an address that is trusted by
fiat.
It is very likely that the minimal jextract tool we will offer will
expose at least a knob to let you decide whether you want to generate
'safe' bindings (where unmanaged pointers are unreferenceable) or
'unsafe' ones (where unmanaged pointers are automatically rebased to
fake segments whose bounds are very lax, so as to allow as much access
as you want). In our experience, when porting the libclang API using
this strategy, we only had to use the unsafe rebase operation twice on a
very large corpus of code - so the 'safe' translation works well. Of
course we know that the degree of success will depend on the library you
are interacting with - and for some particularly ill-behaved library,
the onus required to rebase all unmanaged addresses might be enough to
switch over to the 'unsafe' extraction mode.
I hope this answers (at least in part) some of your concerns with the
previous API, and that it helps you understand how the new, lower level
API will work in that direction.
Thanks
Maurizio
More information about the panama-dev
mailing list