jextract C++ support

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Aug 17 08:46:59 UTC 2023


Hi,
it seems like the binding generator is emitting bindings for private 
string fields?

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/basic_string.h#L211

While PhantomData seems some magic Rust thingie:

https://doc.rust-lang.org/stable/std/marker/struct.PhantomData.html

Which is probably used to deal with lifetime of the string char array 
(but this is a guess).

Maurizio

On 17/08/2023 05:14, Rel wrote:
> Hi,
>
> I start to look on rust-bindgen and did few experiments here:
>
> https://github.com/enatai/panamaexperiments/blob/main/rust-bindgen/README.md 
> <https://urldefense.com/v3/__https://github.com/enatai/panamaexperiments/blob/main/rust-bindgen/README.md__;!!ACWV5N9M2RV99hQ!JKkwWpDI9MsiWp4MJSA43oz6BbNqfK_Vikm4I5JiB87pf4lcNF8rWmmkvMYVHG1RYRG4-KJx-_sorV5wvpEAntI$>
>
> My question is around what rust-bindgen generated for std::string
>
> Mostly those are ordinary data fields for which jextract possibly can 
> generate layout but I don't really understand purpose of:
>
> pub _phantom_0: 
> ::std::marker::PhantomData<::std::cell::UnsafeCell<_CharT>>,
>
> and what alternative for that can be from jextract. Is it some special 
> Rust feature, or?
> ------- Original Message -------
> On Monday, May 29th, 2023 at 9:00 AM, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>>
>> On 29/05/2023 00:20, Rel wrote:
>>> dynamic dispatch
>>>
>>> I tried following example 
>>> [https://github.com/enatai/panamaexperiments/blob/main/libcppexperiments/src/main/public/happy.hpp 
>>> <https://urldefense.com/v3/__https://github.com/enatai/panamaexperiments/blob/main/libcppexperiments/src/main/public/happy.hpp__;!!ACWV5N9M2RV99hQ!JnMeRRKWI0A8qpGkiXzcR0f06AAFEP-dXJV8lT-ZAMcxwE3BhDeGj91EgKkeMudSckPxr_-N9AlNWSGCtyvUZL0$>] 
>>> and it works fine as long as we generate proper Java bindings.
>>>
>>> See test for it 
>>> [https://github.com/enatai/panamaexperiments/blob/main/cppexperiments/src/test/java/cppexperiments/HappyTests.java#L36 
>>> <https://urldefense.com/v3/__https://github.com/enatai/panamaexperiments/blob/main/cppexperiments/src/test/java/cppexperiments/HappyTests.java*L36__;Iw!!ACWV5N9M2RV99hQ!JnMeRRKWI0A8qpGkiXzcR0f06AAFEP-dXJV8lT-ZAMcxwE3BhDeGj91EgKkeMudSckPxr_-N9AlNWSGCRH2ufOA$>]
>>>
>>> Please let me know which dynamic dispatch use cases you are 
>>> concerned with. Because this one seems works fine.
>>
>> Sorry, I can see that working fine because you declare a "static" 
>> function which accepts a point, so the vtable indirection is 
>> generated by the CPP compiler in that function.
>>
>> What I'm worried about is calling virtual methods on classes. E.g. 
>> calling your "distance" function directly. Jextract gives you two 
>> possibilities: Point2d::distance and Point3d::distance. If you pass a 
>> Point3d object to Point2d::distance you will "only" get 
>> Point2d::distance to be called (as if there was no dynamic dispatch).
>>
>>
>>>
>>> std::string
>>> I totally forgot that such basic type like std::string in C++ is a 
>>> template.
>>> But it seems possible to call functions which operate with string 
>>> objects because symbols for them are present:
>>>
>>> 00000000000013ad T _ZN7unhappy10helloWorldB5cxx11Ev
>>>
>>> std::string helloWorld();
>>>
>>> I guess it is possible to create/extract layout for std::string 
>>> using FFM but:
>>> - how to initialize this layout from Java? we cannot just call 
>>> std::string constructor for it, right?
>>> - this layout may differ between different C++ runtimes (libstdc++ 
>>> etc). MS C++ may have not same std::string layout as GCC
>>
>> On the latter, e.g. layout difference, this is no different than 
>> anything else with jextract. E.g. each jextract run is 
>> platform-dependent, as it pulls in header files that are heavily 
>> influenced by the platform and OS you run on.
>>
>> If I understand correctly, "string" is the "instantiation" of a 
>> template in C++. (e.g. some basic_string<char>). That instantiation 
>> is fully defined (e.g. not partial), and I believe it should be 
>> possible, with libclang, to obtain more information about it - such 
>> as the layout etc. (for partial template instantiation, my 
>> understanding, reading on what Rust bindgen does is that it is not 
>> possible to handle them with libclang).
>>
>> So, ideally, we should be able to construct a layout for 
>> basic_string<char>, and then pass that to the constructor, yes.
>>
>> Maurizio
>>
>>
>>>
>>> > Their binding generator adopts the same simple approach as the one I showed in the patch.
>>>
>>> I will take a look
>>>
>>>
>>> ------- Original Message -------
>>> On Tuesday, May 23rd, 2023 at 8:58 AM, Maurizio Cimadamore 
>>> <maurizio.cimadamore at oracle.com> wrote:
>>>
>>>>
>>>> On 23/05/2023 05:11, Rel wrote:
>>>>> > What I meant for "robust analysis" was to try and establish how many _real-world_ 
>>>>> C++ library can really be tackled in such a direct approach.
>>>>>
>>>>> Ohh I see now, I am affraid we know the answer for this :)
>>>>>
>>>>> Let's imagine if number of C++ libraries which can be covered 
>>>>> end-to-end with "simple" approach is 0, does it mean that we 
>>>>> should discard it and only focus on shim for binding all kinds of 
>>>>> APIs? What about those cases which can be easily extracted using 
>>>>> "simple" approach, like Point2d?
>>>> Perhaps we should reach out to the Rust community? Their binding 
>>>> generator adopts the same simple approach as the one I showed in 
>>>> the patch. Given how hard it is to support C++ (because the 
>>>> underlying libclang C API is not very solid in that respect), I'd 
>>>> be surprised if they maintained all the necessary code just for 
>>>> stuff like Point2d?
>>>>>
>>>>> Because I thought that we would like to do "analysis" of what C++ 
>>>>> use cases can/cannot be covered with "simple" approach. For 
>>>>> example from your previous message I see that we are not 
>>>>> completely sure about exceptions:
>>>>>
>>>>> > * (probably way more stuff, like exceptions, etc.)
>>>>>
>>>>> Similarly for myself I would like to see what are the problems 
>>>>> with "dynamic dispatch". I added it to "unhappy" for now, because 
>>>>> we expect it not to work, but I plan to test it to see what are 
>>>>> the issues there and share here. Similarly with that anyone would 
>>>>> be able to reproduce and see same results.
>>>>>
>>>>> I guess my question now is: do we think it may be useful to know 
>>>>> exactly how many C++ use cases can be covered with "simple" FFM 
>>>>> approach. And if answer is yes, then we can use panamaexperiments 
>>>>> as a playground where we can have tests for what is covered. This 
>>>>> (possibly?) can give us more confidence in limitations of "simple" 
>>>>> approach and how far we can go with it (and this can be easily 
>>>>> demonstrated to everyone just by running those tests)
>>>>
>>>> I think it would be useful to know some answer to that question, 
>>>> yes. My intuition tells me that there are probably two kinds of C++ 
>>>> libraries: those who were born that way, and those that moved over 
>>>> from being simpler C libraries. One such example in the latter 
>>>> category is OpenCV [1]. While its "core" header [2] declares an 
>>>> exception, as well as a bunch of classes, eyeballing it, it doesn't 
>>>> seem "too" problematic? Perhaps that would be a good point where to 
>>>> start, and, if a library such as that can be used with some degree 
>>>> of success, perhaps we can expand the search to other similar 
>>>> libraries.
>>>>
>>>> [1] - https://opencv.org/
>>>> [2] - 
>>>> https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core.hpp
>>>>
>>>>>
>>>>> Ideas?
>>>>>
>>>>> ------- Original Message -------
>>>>> On Monday, May 22nd, 2023 at 9:14 AM, Maurizio Cimadamore 
>>>>> <maurizio.cimadamore at oracle.com> wrote:
>>>>>
>>>>>>
>>>>>> On 22/05/2023 04:12, Rel wrote:
>>>>>>>> But I believe some more robust
>>>>>>>> analysis should be made to understand exactly how many APIs can be
>>>>>>>> supported in this "simple" fashion.
>>>>>>> Yes, I started to gather such analysis herehttps://urldefense.com/v3/__https://github.com/enatai/panamaexperiments__;!!ACWV5N9M2RV99hQ!KOxdK2qmmSzlaaO3kSlSUG2-ifWAVRD6OHlz9NHYvuggmy7NnnNxWvHYcxDm0Vn4gPXlbasjfC-ehIx_KlWNYiU$
>>>>>>> Currently there is only one happy case [https://urldefense.com/v3/__https://github.com/enatai/panamaexperiments/blob/main/libcppexperiments/src/main/public/happy.hpp__;!!ACWV5N9M2RV99hQ!KOxdK2qmmSzlaaO3kSlSUG2-ifWAVRD6OHlz9NHYvuggmy7NnnNxWvHYcxDm0Vn4gPXlbasjfC-ehIx_K7Nl50c$  ] which is Point2d class from your foo.hpp file.
>>>>>>
>>>>>> This is not too surprising - after all the hacky changes I shared 
>>>>>> were built around that example.
>>>>>>
>>>>>> What I meant for "robust analysis" was to try and establish how 
>>>>>> many _real-world_ C++ library can really be tackled in such a 
>>>>>> direct approach. My feeling is "not many" - but I don't have any 
>>>>>> hard data to back up this claim.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jextract-dev/attachments/20230817/334155ab/attachment-0001.htm>


More information about the jextract-dev mailing list