notes on binding C++

Sun Feb 4 02:38:56 UTC 2018

On 02/01/2018 08:27 PM, Maurizio Cimadamore wrote:
> We're currently working on a PoC which blends together some elements of 
> LDL with some elements of the layout description that is currently 
> implemented - together with an API which allows developers to also 
> create layouts programmatically; we hope to be able to share details on 
> this soon.

Sounds great! I am looking forward to reading more about this. Although 
it does not sound like it will be of much use for linear algebra, this 
is not a problem. It is easy enough to come up with an "indexer package" 
as part of a high-level library.

> Hope this help clarifying the design goals; in a way, jextract is a 
> means to an end, it's not the end in itself. The big bet is on coming up 
> with an API that can be well understood by cooperative binders and VMs. 
> If we achieve that, tools will follow; jextract is 'just' an example of 
> such a tool.

Yes, I think we all agree on that, but what I am trying to explain here 
is that this is not where this is going. It is not just about, for 
example, being able to call native functions with only a simple 
declaration. If the only way you can come up to support C++ is by 
generating stubs that get wrapped in C functions, which cannot be 
function-like macros or inline functions because the foundation does not 
support them, we are not gaining anything in terms of performance over 
JNI. The traditional way of wrapping goes

Java -> JNI wrappers -> C++ -> C (including system calls)

Now, what is basically happening with Panama is that we are 
specializing, optimizing, for C, and we get

Java -> C (including systems calls)

But what about C++? The only thing that you have been able to show me 
until now requires

Java -> C wrappers -> C++

For C++ libraries, that gains us nothing over JNI! Except that now we 
have 2 ways of accessing native libraries that have very different APIs. 
Not so great for usability, but that is OK, in a way.

In any case, for C headers, by incorporating a complete C runtime in the 
JVM it would be possible to parse all required files transparently, so 
there would be no need for users to even worry about mapping those 
manually. They could just call system functions, but this is something 
we can already do with what I am calling "presets":
https://github.com/bytedeco/javacpp-presets/tree/master/systems

In a nutshell, from what I can see, we are regressing in terms of 
usability, gaining a bit in performance for some function calls (but not 
inline ones, where in my opinion it matters most, unless we include a 
complete C runtime, which you are not). I mean, it is fine to have a 
foundation that is useful for C and system calls, but I still do not see 
a way for this to be useful in the case of C++!

Samuel

On 02/01/2018 08:27 PM, Maurizio Cimadamore wrote:
> 
> 
> On 01/02/18 08:05, Samuel Audet wrote:
>> Maurizio, Henry, thanks for the clarifications! This is starting to 
>> make more sense to me. If the goal at the moment is to lay out a 
>> foundation though, can we consider the parsing functionality in 
>> jextract premature and that it will not become part of any specs in 
>> the near future?
>>
>> As foundation, data layouts are very interesting, and is something 
>> that is sorely missing from Java indeed. Currently with JavaCPP, the 
>> user has the choice between accessing fields easily with JNI, or 
>> manually by computing offsets from metadata returned by the compiler. 
>> I also came up with the indexer package (roughly equivalent to the C# 
>> functionality available under the same name) to access easily and 
>> efficiently multidimensional data structures from images, matrices, 
>> and tensors:
>>     http://bytedeco.org/news/2014/12/23/third-release/
>> Although this is useful for a limited number of use cases, those are 
>> important use cases (<cough>deep learning</cough>). Does the data 
>> layout functionality of Panama offer that kind of wrapping with 
>> strides and dimensions? It looks like something is there, but unclear 
>> what exactly:
>> "Multi-dimensional arrays are laid out in row-major order."
>> https://github.com/J9Java/panama-docs/blob/master/StateOfTheLDL.html
> Hi,
> there's a newer document on data layout, see:
> 
> http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
> 
> Layout definitions are capable of capturing arrays:
> 
> replication = count_prefix element
> 
> You can combine this with groups, and obtain multidimensional arrays:
> 
> 5[4[8b]]
> 
> That is, this is 5 element array, where each element is a four element 
> array whose element is a byte.
> 
> I'd say it's not the job of the layout description to give 'semantics' 
> to such layout - e.g. are there 5 rows and 4 columns? Or is it 4 rows 
> and 5 columns? That's up to how the native data structure is used; 
> however, LDL allows developers to put annotations in their layouts:
> 
> 5[4[8b] (columns)] (rows)
> 
> And a framework can be developed to understand such user-defined 
> annotation in order to implement access and offset calculation in a more 
> user friendly fashion.
> 
> P.S.
> We're currently working on a PoC which blends together some elements of 
> LDL with some elements of the layout description that is currently 
> implemented - together with an API which allows developers to also 
> create layouts programmatically; we hope to be able to share details on 
> this soon.
>>
>> One important feature that is certainty missing from jextract is a way 
>> to associate a Pointer with native methods. Support for native methods 
>> is actually a very nice feature of Java, which is missing from C#. We 
>> can write classes like the following and have all the plumbing 
>> generated with JNI at build time with a tool like JavaCPP, and it 
>> literally just works:
> Without getting into the specific of the code - I think I buy your 
> argument, and I'm very sensitive to it. I think it can be described 
> roughly as trying to reduce the cost of the entry ticket to the native 
> interop world. In other words, if you use jextract, you can trust it to 
> generate whatever blob you need in order to achieve interop; and even if 
> the generated code is horrible, you don't care much, after all it's 
> hidden in a jarfile and you only access its (hopefully tidy!) public API.
> 
> But there's gonna be another class of users too, as you hint in your 
> email: those who want to simply call a some native method 'over there'. 
> And life for them should be easy too. I think even w/o bringing up C++, 
> I can't say the situation in this department looks too rosy - for 
> instance, have a look at this example:
> 
> http://hg.openjdk.java.net/panama/dev/file/8499209102d4/test/jdk/java/nicl/System/UnixSystem.java#l61 
> 
> 
> Is stuff like this what such experienced programmers want to write? 
> Honestly, I don't think so. In other words, even if interop with C is 
> easier, there's still quite a lot of metadata that needs to be attached 
> to an interface declaration, and for complex structs, the layout 
> description can be very verbose - do we expect programmers to have to 
> grok it?
> 
> Where I see this going is that, again, the set of interfaces and 
> annotations form a sort of API that the binder/VM will happily swallow 
> to give you the interop you need. But there needs to be an ecosystems of 
> tools targeting this API. Jextract is a piece of the story (e.g. from 
> header file to .jar); the tool that was mentioned by Stephen Kell could 
> be another (from debugging symbols to .jar). And at some point we'll 
> need some tool to go from 'friendly source code' to .jar too (perhaps 
> using annotations a la JNR and having an annotation processor to spit 
> out the full version of the annotated sources - e.g. infer full blown 
> metadata from more user friendly one).
> 
> Hope this help clarifying the design goals; in a way, jextract is a 
> means to an end, it's not the end in itself. The big bet is on coming up 
> with an API that can be well understood by cooperative binders and VMs. 
> If we achieve that, tools will follow; jextract is 'just' an example of 
> such a tool.
> 
> Maurizio
>>
>> public class Something extends Pointer {
>>     private native allocate();
>>     public Something() { allocate(); }
>> }
>>
>> public class MyCPPClass extends Pointer {
>>     private native allocate();
>>     public MyCPPClass() { allocate(); }
>>     public native Something myFunction(Something something);
>> }
>>
>> With jextract (or C# Platform Invoke, cgo, etc), we not only have to 
>> come up with a wrapper in C, but we end up having to do something like 
>> this in Java (or C#, Go, etc):
>>
>> public class CPPPointer {
>>     Pointer myAddress;
>>     public CPPPointer(Pointer address) { myAddress = address; }
>> }
>>
>> public class Something extends CPPPointer {
>>     public Something() { address = MyCPPWrapper.i.allocateSomething(); }
>>     public Something(Pointer address) { super(address); }
>> }
>>
>> public class MyCPPClass extends CPPPointer {
>>     public MyCPPClass() { address = 
>> MyCPPWrapper.i.allocateMyCPPClass(); }
>>     public Something myFunction(Something something) {
>>         return new Something(wrapper.myFunction(myAddress, 
>> something.myAddress));
>>     }
>> }
>>
>> interface MyCPPWrapper {
>>     static MyCPPWrapper i = Library.load(MyCPPWrapper.class);
>>     Pointer allocateSomething();
>>     Pointer allocateMyCPPClass();
>>     Pointer myWrappedFunction(Pointer address, Pointer something);
>> }
>>
>> And that does not even account for object deallocation, which JavaCPP 
>> does transparently with either phantom references or 
>> try-with-resources, as per the user's wishes. In my opinion, we are 
>> regressing in terms of usability here. If jextract ever comes up with 
>> support for C++ and starts outputting code like this, (which by the 
>> way is not even safe without some guarantees from the code generator) 
>> you are basically forcing users to use jextract to parse all their 
>> header files, probably by copy/pasting bits and pieces of them ad 
>> hocly à la SWIG until it compiles, when they might just want to call 
>> only one very specific tiny function! If the goal is to build a 
>> foundation for C++, that should be a priority. I hope I was able to 
>> make it clear that figuring support for C++ maybe later one day but 
>> let's not think about it for now because we can do everything with C 
>> even if it's not too clean, is not an option. We can see what that 
>> looks like with CppSharp for an example:
>> https://github.com/mono/CppSharp/blob/master/docs/GeneratingBindings.md
>>
>> Samuel
>>
>> On 02/01/2018 04:04 AM, Henry Jen wrote:
>>> We had experimented some C++ support, Mikael had being able to make 
>>> call into C++ for simple case, but as you know, ABI for C++ is not 
>>> standardized, so this is all experimental and very targeted.
>>>
>>> Ultimate fall back mode, to me, is to writing some C code wrapping up 
>>> what is needed, then jextract can make call into those C function 
>>> easily without hassle.
>>>
>>> We are aware, as you suggested, macro/template/inline support is 
>>> tricky, and we are exploring possibilities. My take is that 
>>> eventually we are gonna need hints, either by recognizing some common 
>>> patterns or developer intervention.
>>>
>>> Like Maurizio said, the first phase is to lay out foundation that is 
>>> solid we can built on, and we like feedbacks to ensure the design 
>>> won’t prohibit further improvement on support for different 
>>> languages. We re really focus on fundamentals/primitives allow us to 
>>> make calls and expressive primitive types, others features should be 
>>> able to build on top of that without any issue.
>>>
>>> Cheers,
>>> Henry
>>>
>>
>>
>