some thoughts on panama/jextract

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Jan 10 10:40:20 UTC 2020


On 10/01/2020 02:11, Michael Zucchi wrote:
> On 9/1/20 9:37 pm, Maurizio Cimadamore wrote:
>>
>>
>> To dereference you need to construct a VarHandle (of the right type) 
>> and use it against a MemoryAddress. See an example here of how you 
>> would implement struct accessors in this way:
>>
>> http://hg.openjdk.java.net/panama/dev/file/foreign-abi/test/jdk/java/foreign/StdLibTest.java#l261 
>>
>>
>
>
> Yes of course, I had all that working, but sorry my fault: I missed 
> the "ForiegnUnsafe" bit in the constructor and only saw cases where 
> you allocated the memory segment from java.
>
> Thanks to the info from Jorn I solved that problem.
>
>>>
>>> So take these comments wrt not knowing how it works now.
>>>
>>> I know it's an attempt to provide some managed safety but as you're 
>>> already calling C that horse has already largely bolted. For example 
>>> you're trusting that the Java definition of a structure size and 
>>> layout matches the C compiler when you call C, but aren't trusting 
>>> the same information when it comes back (at least not from java).  C 
>>> also supports pointers you cannot dereference so both cases are 
>>> necessary.
>>>
>>> So what you call "not so rosy" to me is "fundamental and basic c". 
>>> That really should have first-class support and not be hidden by any 
>>> complexity that will make it harder to use and thus more prone to 
>>> mistakes.
>>
>> As I said, we have at least two experiments (the only we've done so 
>> far with the minimal extract): OpenGL and LibClang - in both cases, 
>> the number of times in which was actually required to break the 
>> escape hatch was small - none in OpenGL, and a 2 in libclang.
>>
>> I think this is a topic where it's easy to get biased by the specific 
>> library one is looking at.
>>
> Well I mean, it is a C library, and this project purports to support 
> "calling C libraries without JNI", so it's not a matter of 'bias' as 
> such.  This is also a public project and requests 'community' feedback 
> so here we are.
>
> And to follow your argument, indeed OpenGL is a very specific case 
> that uses integer handles for everything.  This isn't particularly 
> common.
I think defining what _common_ means is tricky. Most of the libraries we 
tried (and no, we did not hand-picked them) fell in this category, but I 
don't want to get into an idiom style war - I've well prepared to assume 
that there are ell-behaved libraries and less-so ones.
>
> Vulkan mostly uses opaque pointers and user-supplied output buffers 
> but vkMapMemory is the only way to move application data to/from the 
> device.  And that returns a pointer.  Simlarly for OpenCL and it's 
> memory mapping function, although it also has memory copy functions too.
You said that yourself: _mostly_ used opaque pointers. And then has a 
bunch of functions which don't. The approach I described works with all 
the opaque stuff (which I expect to be the norm), and then gives you the 
ability to override segment length information for the routines which 
are using a different idiom.
>
> And then there's the case of string pointers which i've already 
> mentioned.  Some apis will copy them but many wont because it's clumsy 
> to write, clumsy to use, and needlessly inefficient. They're probably 
> even worse than structures because as of now you have to: create an 
> unsafe big segment that can hold the potential size, walk the bytes to 
> find it's length, then create another unsafe segment for the actual 
> length.  Then copy it to a byte array.
>
> And although it isn't super common some libraries have their own 
> allocation functions that you must or should use instead of malloc and 
> friends.
>
>> I'm not saying "you should always use the safe idiom, if you don't 
>> want" - I'm saying there should be a choice, so that well-behaved 
>> libraries can provide safe-by-default bindings.
>>
> I don't really understand this argument.  They will just automatically 
> have this "safety" if that's the way they're written. But if they're 
> not then you simply don't have any choice in the matter anyway.  I 
> mean what are you going to do, patch the upstream library for a java 
> binding?  Given such a restriction either you can bind the library or 
> you can't and throwing an error at the developer and forcing them to 
> add another 'now obviously know what you're doing!' argument isn't 
> going to win any friends. JNI has none of these restrictions remember.

The point I'm making is about what the default should be. I think 
there's no question that, at the expressiveness level, you can make 
things to work in the current scheme. The only real question here is 
one: should pointers returned by native libraries be de-referenceable by 
default or not?

You seem to argue that (as others have done before you), once you go 
down the native rabbit hole, there is no point enforcing safety - it's a 
lost cause. On the contrary, I believe that having explicit 'break out 
of jail' operations _manifest_ in the code, is useful no matter what. 
Are you getting a weird segfault? First go and check all the places 
where you injected length information and look for potential mistakes. 
This is an approach that can be very useful when thing go wrong (and 
they often do).

In terms of choice, as I mentioned, jextract will provide a flag to 
switch the safety behavior; let's call it "--safe-bindings=true/false". 
If the flag is set to false, jextract will automatically resize all 
pointers coming from the native size so that they have unbounded size - 
and so that users can freely dereference them at will. I suspect such a 
flag can be a popular choice on some libraries. But my point is that you 
can derive unsafe behavior from safe one (as I've explained), whereas it 
is extremely hard to derive safe behavior from unsafe-by-default.


Can I suggest that, for now, we wait until we have the minimal jextract 
out, try out few libraries and see how it goes? Maybe the 
safe-by-default idiom will kick in out but after 2 weeks, maybe not - 
but I think we need to collect some real world evidence before jumping 
to conclusions, one way or another?

Maurizio




More information about the panama-dev mailing list