notes on binding C++

Thu Feb 1 11:27:24 UTC 2018

On 01/02/18 08:05, Samuel Audet wrote:
> Maurizio, Henry, thanks for the clarifications! This is starting to 
> make more sense to me. If the goal at the moment is to lay out a 
> foundation though, can we consider the parsing functionality in 
> jextract premature and that it will not become part of any specs in 
> the near future?
>
> As foundation, data layouts are very interesting, and is something 
> that is sorely missing from Java indeed. Currently with JavaCPP, the 
> user has the choice between accessing fields easily with JNI, or 
> manually by computing offsets from metadata returned by the compiler. 
> I also came up with the indexer package (roughly equivalent to the C# 
> functionality available under the same name) to access easily and 
> efficiently multidimensional data structures from images, matrices, 
> and tensors:
>     http://bytedeco.org/news/2014/12/23/third-release/
> Although this is useful for a limited number of use cases, those are 
> important use cases (<cough>deep learning</cough>). Does the data 
> layout functionality of Panama offer that kind of wrapping with 
> strides and dimensions? It looks like something is there, but unclear 
> what exactly:
> "Multi-dimensional arrays are laid out in row-major order."
> https://github.com/J9Java/panama-docs/blob/master/StateOfTheLDL.html
Hi,
there's a newer document on data layout, see:

http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html

Layout definitions are capable of capturing arrays:

replication = count_prefix element

You can combine this with groups, and obtain multidimensional arrays:

5[4[8b]]

That is, this is 5 element array, where each element is a four element 
array whose element is a byte.

I'd say it's not the job of the layout description to give 'semantics' 
to such layout - e.g. are there 5 rows and 4 columns? Or is it 4 rows 
and 5 columns? That's up to how the native data structure is used; 
however, LDL allows developers to put annotations in their layouts:

5[4[8b] (columns)] (rows)

And a framework can be developed to understand such user-defined 
annotation in order to implement access and offset calculation in a more 
user friendly fashion.

P.S.
We're currently working on a PoC which blends together some elements of 
LDL with some elements of the layout description that is currently 
implemented - together with an API which allows developers to also 
create layouts programmatically; we hope to be able to share details on 
this soon.
>
> One important feature that is certainty missing from jextract is a way 
> to associate a Pointer with native methods. Support for native methods 
> is actually a very nice feature of Java, which is missing from C#. We 
> can write classes like the following and have all the plumbing 
> generated with JNI at build time with a tool like JavaCPP, and it 
> literally just works:
Without getting into the specific of the code - I think I buy your 
argument, and I'm very sensitive to it. I think it can be described 
roughly as trying to reduce the cost of the entry ticket to the native 
interop world. In other words, if you use jextract, you can trust it to 
generate whatever blob you need in order to achieve interop; and even if 
the generated code is horrible, you don't care much, after all it's 
hidden in a jarfile and you only access its (hopefully tidy!) public API.

But there's gonna be another class of users too, as you hint in your 
email: those who want to simply call a some native method 'over there'. 
And life for them should be easy too. I think even w/o bringing up C++, 
I can't say the situation in this department looks too rosy - for 
instance, have a look at this example:

http://hg.openjdk.java.net/panama/dev/file/8499209102d4/test/jdk/java/nicl/System/UnixSystem.java#l61

Is stuff like this what such experienced programmers want to write? 
Honestly, I don't think so. In other words, even if interop with C is 
easier, there's still quite a lot of metadata that needs to be attached 
to an interface declaration, and for complex structs, the layout 
description can be very verbose - do we expect programmers to have to 
grok it?

Where I see this going is that, again, the set of interfaces and 
annotations form a sort of API that the binder/VM will happily swallow 
to give you the interop you need. But there needs to be an ecosystems of 
tools targeting this API. Jextract is a piece of the story (e.g. from 
header file to .jar); the tool that was mentioned by Stephen Kell could 
be another (from debugging symbols to .jar). And at some point we'll 
need some tool to go from 'friendly source code' to .jar too (perhaps 
using annotations a la JNR and having an annotation processor to spit 
out the full version of the annotated sources - e.g. infer full blown 
metadata from more user friendly one).

Hope this help clarifying the design goals; in a way, jextract is a 
means to an end, it's not the end in itself. The big bet is on coming up 
with an API that can be well understood by cooperative binders and VMs. 
If we achieve that, tools will follow; jextract is 'just' an example of 
such a tool.

Maurizio
>
> public class Something extends Pointer {
>     private native allocate();
>     public Something() { allocate(); }
> }
>
> public class MyCPPClass extends Pointer {
>     private native allocate();
>     public MyCPPClass() { allocate(); }
>     public native Something myFunction(Something something);
> }
>
> With jextract (or C# Platform Invoke, cgo, etc), we not only have to 
> come up with a wrapper in C, but we end up having to do something like 
> this in Java (or C#, Go, etc):
>
> public class CPPPointer {
>     Pointer myAddress;
>     public CPPPointer(Pointer address) { myAddress = address; }
> }
>
> public class Something extends CPPPointer {
>     public Something() { address = MyCPPWrapper.i.allocateSomething(); }
>     public Something(Pointer address) { super(address); }
> }
>
> public class MyCPPClass extends CPPPointer {
>     public MyCPPClass() { address = 
> MyCPPWrapper.i.allocateMyCPPClass(); }
>     public Something myFunction(Something something) {
>         return new Something(wrapper.myFunction(myAddress, 
> something.myAddress));
>     }
> }
>
> interface MyCPPWrapper {
>     static MyCPPWrapper i = Library.load(MyCPPWrapper.class);
>     Pointer allocateSomething();
>     Pointer allocateMyCPPClass();
>     Pointer myWrappedFunction(Pointer address, Pointer something);
> }
>
> And that does not even account for object deallocation, which JavaCPP 
> does transparently with either phantom references or 
> try-with-resources, as per the user's wishes. In my opinion, we are 
> regressing in terms of usability here. If jextract ever comes up with 
> support for C++ and starts outputting code like this, (which by the 
> way is not even safe without some guarantees from the code generator) 
> you are basically forcing users to use jextract to parse all their 
> header files, probably by copy/pasting bits and pieces of them ad 
> hocly à la SWIG until it compiles, when they might just want to call 
> only one very specific tiny function! If the goal is to build a 
> foundation for C++, that should be a priority. I hope I was able to 
> make it clear that figuring support for C++ maybe later one day but 
> let's not think about it for now because we can do everything with C 
> even if it's not too clean, is not an option. We can see what that 
> looks like with CppSharp for an example:
> https://github.com/mono/CppSharp/blob/master/docs/GeneratingBindings.md
>
> Samuel
>
> On 02/01/2018 04:04 AM, Henry Jen wrote:
>> We had experimented some C++ support, Mikael had being able to make 
>> call into C++ for simple case, but as you know, ABI for C++ is not 
>> standardized, so this is all experimental and very targeted.
>>
>> Ultimate fall back mode, to me, is to writing some C code wrapping up 
>> what is needed, then jextract can make call into those C function 
>> easily without hassle.
>>
>> We are aware, as you suggested, macro/template/inline support is 
>> tricky, and we are exploring possibilities. My take is that 
>> eventually we are gonna need hints, either by recognizing some common 
>> patterns or developer intervention.
>>
>> Like Maurizio said, the first phase is to lay out foundation that is 
>> solid we can built on, and we like feedbacks to ensure the design 
>> won’t prohibit further improvement on support for different 
>> languages. We re really focus on fundamentals/primitives allow us to 
>> make calls and expressive primitive types, others features should be 
>> able to build on top of that without any issue.
>>
>> Cheers,
>> Henry
>>
>
>