some thoughts on panama/jextract
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 9 11:07:54 UTC 2020
On 09/01/2020 02:10, Michael Zucchi wrote:
> On 8/1/20 9:30 pm, Maurizio Cimadamore wrote:
>>
>> On 08/01/2020 04:36, Michael Zucchi wrote:
>>
>> First, it's worth noting that there's a lot you can do even with
>> unmanaged, non-dereferenceable addresses - you can e.g. pass them to
>> other native functions. So in case of opaque pointer idioms (which
>> seems your case) this will work most of the times w/o any required
>> changes on the developer part. But what if you really need to
>> dereference what comes back from a native library? Here there are two
>> cases.
>>
>
> The api would have to consist solely of opaque pointers, primitive
> types, and caller-allocated dynamic memory. This just isn't very
> common in my experience.
>
> I forgive you for not looking too closely but at least in the case of
> jjmpeg the code is absolutely riddled with pointer dereferencing of
> library-allocated data, all the get/set methods for example just
> access the struct fields inside the JNI functions. Again it looks all
> nice thanks to jni.
>
> Where C libraries have non-opaque structs they tend to want to
> allocate them themselves to reduce versioning costs so you rarely pass
> in stack or caller-allocated buffers. Even something simple like
> 'const char *avutil_license(void)' requires dereferencing a pointer,
> unless there's some other mechanism for constant strings.
>
>> In the happy case where the library is giving you back some address,
>> and you happen to have some other segment that you own, which you
>> know is going to contain that address (by construction), then you can
>> perform a so called 'rebase' operation - that is, take the unmanaged
>> address and reassociate it with the managed segment. If the segment
>> indeed contains the address, what you get back is a new, managed
>> address which you can use normally.
>>
> I mean yeah I guess that's possible but that covers such a very
> limited set of cases. I can't even really think of any apart from
> stpcpy() or obstacks(!) and there would be no reason to use those.
>
>> But not all cases are so rosy - there are also cases where a library
>> gives you back a pointer to some struct which you are expected to
>> dereference, and the pointer is into some area of memory managed by
>> the library itself - and so it cannot be safely rebased (since you
>> have no segment to rebase it to). In this case, we make available an
>> 'unsafe' functionality to create a segment out of thin air, with
>> desired starting address and bounds. You can then use this fresh (but
>> unsafe) segment to rebase your unmanaged address, and obtain an
>> address that is trusted by fiat.
>>
>
> Ok i'm glad to hear that. I thought i'd just hit an a absolute
> deal-breaker after everything else was looking so rosy!
>
> This may be a silly question - but how do you access it? I've been
> looking through the api this morning and haven't worked it out. I'm
> hopefully on the right 'foreign-abi' branch, i got the updates that
> came through while i was typing this. I can see some support in
> internals/Utils but the public functionality isn't obvious.
To dereference you need to construct a VarHandle (of the right type) and
use it against a MemoryAddress. See an example here of how you would
implement struct accessors in this way:
http://hg.openjdk.java.net/panama/dev/file/foreign-abi/test/jdk/java/foreign/StdLibTest.java#l261
>
> So take these comments wrt not knowing how it works now.
>
> I know it's an attempt to provide some managed safety but as you're
> already calling C that horse has already largely bolted. For example
> you're trusting that the Java definition of a structure size and
> layout matches the C compiler when you call C, but aren't trusting the
> same information when it comes back (at least not from java). C also
> supports pointers you cannot dereference so both cases are necessary.
>
> So what you call "not so rosy" to me is "fundamental and basic c".
> That really should have first-class support and not be hidden by any
> complexity that will make it harder to use and thus more prone to
> mistakes.
As I said, we have at least two experiments (the only we've done so far
with the minimal extract): OpenGL and LibClang - in both cases, the
number of times in which was actually required to break the escape hatch
was small - none in OpenGL, and a 2 in libclang.
I think this is a topic where it's easy to get biased by the specific
library one is looking at.
I'm not saying "you should always use the safe idiom, if you don't want"
- I'm saying there should be a choice, so that well-behaved libraries
can provide safe-by-default bindings.
>
> I don't really see the justification of having
> MemoryAddress::ofLong(p, size) not being available, or the
> MemoryAddress varhandle .get method not taking a length parameter (i
> think better than being able to change the MemoryAddress size as this
> keeps the size immutable). Particularly if there's some
> less-convenient work-around to get the same functionality anyway.
Eventually, it is possible we could add a ofLong(p, size) - right now
we're focusing on the set of primitive moves. But we're trying very hard
to distinguish between operations that are safe and operations in which
the user is essentially saying "you have to trust me". Maybe you always
want tun run in the "I know what I'm doing"-mode, but that assumption
might not be valid for all libraries/users.
In other words, there is a distinction - and I'd like that distinction
not to be lost under the "bbbut C is a mess" assertion.
Maurizio
>
> Michael
>
>
More information about the panama-dev
mailing list