From maurizio.cimadamore at oracle.com Mon Jan 5 10:12:12 2026 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 5 Jan 2026 10:12:12 +0000 Subject: jextract cannot generate portable code (anymore) In-Reply-To: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> References: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> Message-ID: <187542e3-19ee-4ef0-ac77-92163b235da5@oracle.com> Hi, I agree this is an issue that needs to be looked at. Note that the problem you refer to is that the mismatch between extraction-time and runtime is now more explicit, and results in a cast error. Before, the declaration of C_LONG would have gone through, but with the wrong layout. The key to define a portable library is to make sure builtin types are basically never used -- if a library is truly portable, and defines its own primitive types. That said, even a well-behaved portable library, like OpenGL, ends up having some dependencies on builtin types: ``` typedef float?????????? GLfloat;??????? /* single precision float */ ``` So, having a flag to remove the definition of builtin types will likely backfire. It would be nice if there was a mechanism to extend the filtering mechanism to builtin types. So, if long is problematic for your bindings, you could just leave it out. (which is also what we do for jextract libclang bindings). Maurizio On 29/12/2025 22:43, some-java-user-99206970363698485155 at vodafonemail.de wrote: > > Hello, > > the jextract guide [1] says that jextract can generate portable code > if the C code on which it is executed is portable. > That seems to be no longer the case due to > https://bugs.openjdk.org/browse/CODETOOLS-7903923. There are two problems: > > * On Windows it generates `OfInt C_LONG`, on non-Windows `OfLong > C_LONG`. But it looks up the layout dynamically using > `canonicalLayouts().get(...)`. Regardless of whether the generated > code actually uses `C_LONG` you will get a ClassCastException > during initialization when trying for example to use code > generated on Linux on a Windows machine: "ClassCastException: > class jdk.internal.foreign.layout.ValueLayouts$OfIntImpl cannot be > cast to class java.lang.foreign.ValueLayout$OfLong" > * The general approach of using `canonicalLayouts().get(...)` seems > to make this non-portable (even if the code declared `C_LONG` as > the general `ValueLayout` instead of the specific `OfInt` / > `OfLong`, avoiding the ClassCastException), because jextract > converts types such as `size_t` and `int64_t` to `C_LONG` on > Linux, even though these types are defined in `canonicalLayouts()` > as well. > Take for example > https://github.com/tree-sitter/java-tree-sitter/blob/master/scripts/jextract.sh > which runs jextract for > https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h > Note that `api.h` is (if I see it correctly) portable. However for > `int64_t` (used by `ts_tree_cursor_goto_first_child_for_byte`) > jextract uses the non-portable `C_LONG` on Linux. > Another problem are `calloc` and `malloc` where jextract treats > `size_t` as non-portable `C_LONG` as well (I guess `size_t` would > be portable at least across 64 bit platforms, or would fail with a > ClassCastException if not, as desired). > > jextract version: Build 25-jextract+2-4 (2025/11/25) > > Note sure what a good solution to this is. Maybe an opt-out CLI flag > for the CODETOOLS-7903923 behavior, and an update to GUIDE.md? > That would make code generated with jextract on Linux portable to > Windows again I think. Or are there cases where CODETOOLS-7903923 is > really needed (even for portable C libraries)? > Or a way for jextract to not convert `int64_t` and `size_t` to C_LONG, > if that is possible? > > Or is there possibly also a problem with the > https://github.com/tree-sitter/java-tree-sitter setup mentioned above? > For example is there a way to make jextract refer to `int64_t`?in the > generated code instead of `C_LONG`? > > Kind regards > > > [1] > https://github.com/openjdk/jextract/blob/b96ad6618a70ddbdf6b67cc3eb8342efc39c0692/doc/GUIDE.md?plain=1#L103-L111 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorn.vernee at oracle.com Mon Jan 5 12:01:25 2026 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Mon, 5 Jan 2026 13:01:25 +0100 Subject: jextract cannot generate portable code (anymore) In-Reply-To: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> References: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> Message-ID: <5a010bc5-1ecf-4500-a3e5-1006030ebe68@oracle.com> I had a look at what jextract does in the case of int64_t and size_t. In both cases these are typedefs for another builtin type. The former is a typedef for `long` and the latter a typedef for `unsigned long`. jextract uses the underlying type of the typedef to determine which layout to use, so we end up with C_LONG in both cases. While these types are semantically portable, their typedefs may have a non-portable definition, which jextract expands - like a macro - during extraction. This problem is similar to this example: #ifdef WIN32 typedef long long my_int; #else typedef long my_int; #endif This code is portable in the C sense, but jextract eagerly picks one of the two branches of this compiler switch when extracting. This seems like a tricky issue to workaround. In this case I think we'd want the type to be 'resolved' at runtime rather than extraction time, but I don't think we can let jextract collect all the different definitions of `my_int`, and then pick the right one at runtime. Jorn On 29-12-2025 23:43, some-java-user-99206970363698485155 at vodafonemail.de wrote: > > Hello, > > the jextract guide [1] says that jextract can generate portable code > if the C code on which it is executed is portable. > That seems to be no longer the case due to > https://bugs.openjdk.org/browse/CODETOOLS-7903923. There are two problems: > > * On Windows it generates `OfInt C_LONG`, on non-Windows `OfLong > C_LONG`. But it looks up the layout dynamically using > `canonicalLayouts().get(...)`. Regardless of whether the generated > code actually uses `C_LONG` you will get a ClassCastException > during initialization when trying for example to use code > generated on Linux on a Windows machine: "ClassCastException: > class jdk.internal.foreign.layout.ValueLayouts$OfIntImpl cannot be > cast to class java.lang.foreign.ValueLayout$OfLong" > * The general approach of using `canonicalLayouts().get(...)` seems > to make this non-portable (even if the code declared `C_LONG` as > the general `ValueLayout` instead of the specific `OfInt` / > `OfLong`, avoiding the ClassCastException), because jextract > converts types such as `size_t` and `int64_t` to `C_LONG` on > Linux, even though these types are defined in `canonicalLayouts()` > as well. > Take for example > https://github.com/tree-sitter/java-tree-sitter/blob/master/scripts/jextract.sh > which runs jextract for > https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h > Note that `api.h` is (if I see it correctly) portable. However for > `int64_t` (used by `ts_tree_cursor_goto_first_child_for_byte`) > jextract uses the non-portable `C_LONG` on Linux. > Another problem are `calloc` and `malloc` where jextract treats > `size_t` as non-portable `C_LONG` as well (I guess `size_t` would > be portable at least across 64 bit platforms, or would fail with a > ClassCastException if not, as desired). > > jextract version: Build 25-jextract+2-4 (2025/11/25) > > Note sure what a good solution to this is. Maybe an opt-out CLI flag > for the CODETOOLS-7903923 behavior, and an update to GUIDE.md? > That would make code generated with jextract on Linux portable to > Windows again I think. Or are there cases where CODETOOLS-7903923 is > really needed (even for portable C libraries)? > Or a way for jextract to not convert `int64_t` and `size_t` to C_LONG, > if that is possible? > > Or is there possibly also a problem with the > https://github.com/tree-sitter/java-tree-sitter setup mentioned above? > For example is there a way to make jextract refer to `int64_t`?in the > generated code instead of `C_LONG`? > > Kind regards > > > [1] > https://github.com/openjdk/jextract/blob/b96ad6618a70ddbdf6b67cc3eb8342efc39c0692/doc/GUIDE.md?plain=1#L103-L111 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nlisker at gmail.com Sat Jan 10 15:26:43 2026 From: nlisker at gmail.com (Nir Lisker) Date: Sat, 10 Jan 2026 17:26:43 +0200 Subject: Bindings for Python Message-ID: Hi, Has interfacing with Python been looked at in the same sense that interfacing with Rust can be done with cbindgen? Has it been attempted internally and are there any writings on this? Thanks, Nir From maurizio.cimadamore at oracle.com Mon Jan 12 13:28:03 2026 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 12 Jan 2026 13:28:03 +0000 Subject: Bindings for Python In-Reply-To: References: Message-ID: Hi Nir, Python provides a fairly complete library (libpython) that allows any programs compatible with a C interface to call into python. This library can be jextracted (see [1]), so it is possible to use libpython to call Python from Java (via FFM), and, maybe, even to use a tool to automate some of the code generation -- although we didn't do any deep exploration in this direction. (There's also Cython, which allows compiling Python code into C -- but I believe that path to be more similar to Java/JNI -- e.g. the generated C bindings are meant allow Python functions to be implemented in C/C++ -- so the resulting code is still meant to be called from Python, not C) Cheers Maurizio [1] - https://github.com/openjdk/jextract/tree/master/samples/python3 On 10/01/2026 15:26, Nir Lisker wrote: > Hi, > > Has interfacing with Python been looked at in the same sense that > interfacing with Rust can be done with cbindgen? Has it been attempted > internally and are there any writings on this? > > Thanks, > Nir From nlisker at gmail.com Mon Jan 12 20:46:49 2026 From: nlisker at gmail.com (Nir Lisker) Date: Mon, 12 Jan 2026 22:46:49 +0200 Subject: Bindings for Python In-Reply-To: References: Message-ID: Thanks Maurizio, the section in the guide that talks about other languages [1] could use some expansion by what I'm seeing under the samples directory, or at least a link there as well (there's one at the beginning of the guide). I'll have a look at the sample. [1] https://github.com/openjdk/jextract/blob/master/doc/GUIDE.md#other-languages On Mon, Jan 12, 2026 at 3:28?PM Maurizio Cimadamore wrote: > > Hi Nir, > Python provides a fairly complete library (libpython) that allows any > programs compatible with a C interface to call into python. > This library can be jextracted (see [1]), so it is possible to use > libpython to call Python from Java (via FFM), and, maybe, even to use a > tool to automate some of the code generation -- although we didn't do > any deep exploration in this direction. > > (There's also Cython, which allows compiling Python code into C -- but I > believe that path to be more similar to Java/JNI -- e.g. the generated C > bindings are meant allow Python functions to be implemented in C/C++ -- > so the resulting code is still meant to be called from Python, not C) > > Cheers > Maurizio > > [1] - https://github.com/openjdk/jextract/tree/master/samples/python3 > > On 10/01/2026 15:26, Nir Lisker wrote: > > Hi, > > > > Has interfacing with Python been looked at in the same sense that > > interfacing with Rust can be done with cbindgen? Has it been attempted > > internally and are there any writings on this? > > > > Thanks, > > Nir From nlisker at gmail.com Mon Jan 12 20:46:49 2026 From: nlisker at gmail.com (Nir Lisker) Date: Mon, 12 Jan 2026 22:46:49 +0200 Subject: Bindings for Python In-Reply-To: References: Message-ID: Thanks Maurizio, the section in the guide that talks about other languages [1] could use some expansion by what I'm seeing under the samples directory, or at least a link there as well (there's one at the beginning of the guide). I'll have a look at the sample. [1] https://github.com/openjdk/jextract/blob/master/doc/GUIDE.md#other-languages On Mon, Jan 12, 2026 at 3:28?PM Maurizio Cimadamore wrote: > > Hi Nir, > Python provides a fairly complete library (libpython) that allows any > programs compatible with a C interface to call into python. > This library can be jextracted (see [1]), so it is possible to use > libpython to call Python from Java (via FFM), and, maybe, even to use a > tool to automate some of the code generation -- although we didn't do > any deep exploration in this direction. > > (There's also Cython, which allows compiling Python code into C -- but I > believe that path to be more similar to Java/JNI -- e.g. the generated C > bindings are meant allow Python functions to be implemented in C/C++ -- > so the resulting code is still meant to be called from Python, not C) > > Cheers > Maurizio > > [1] - https://github.com/openjdk/jextract/tree/master/samples/python3 > > On 10/01/2026 15:26, Nir Lisker wrote: > > Hi, > > > > Has interfacing with Python been looked at in the same sense that > > interfacing with Rust can be done with cbindgen? Has it been attempted > > internally and are there any writings on this? > > > > Thanks, > > Nir From jextract at xpple.dev Wed Jan 21 13:46:45 2026 From: jextract at xpple.dev (jextract at xpple.dev) Date: Wed, 21 Jan 2026 14:46:45 +0100 Subject: Transcription of documentation comments to generated bindings Message-ID: <8b2153cd999599e03a8ed66d897f3439@xpple.dev> Hello, I have been using jextract for a while now, and my experience has been great overall. The only thing that bothered me when using the generated bindings was their lack of documentation. Then I thought: would it be possible to transcribe the documentation of the symbols in the C header file into the generated Java code? Since C and Java have the same (mostly, anyways) comment syntax, this could even be done in a super na?ve way by copying the literal strings. You would look at comment tokens that precede the target symbol (ignoring some whitespace perhaps) and copy it over. Looking at the Clang API, it seems they already associate documentation comments that belong to a declaration with said declaration. Clang attaches a `RawComment` to a `Decl` when the comment is immediately before it and separated only by whitespace. This could be used to transcribe the comment. Of course, the C documentation string wouldn't be in JavaDoc format, but even if it's in Doxygen or any other format, it's _much_ better than nothing at all. Is this something that you guys would be interested to look into? If I find the time I could give it a shot, but given my inexperience with Clang I'm not sure it would result in the best code :). Kind regards, Frederik van der Els PS: I saw the recent discussion on "jextract cannot generate portable code (anymore)" and it's about the exact same issue I reported in March last year ("Bindings crash on Windows where they would not before"). It's indeed the case that the mismatch is now made explicit by crashing, but in my case the C library didn't use any platform dependent types, so the type wouldn't have been used anyways. I would love to see a solution for this. From maurizio.cimadamore at oracle.com Fri Jan 23 12:30:05 2026 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 23 Jan 2026 12:30:05 +0000 Subject: Transcription of documentation comments to generated bindings In-Reply-To: <8b2153cd999599e03a8ed66d897f3439@xpple.dev> References: <8b2153cd999599e03a8ed66d897f3439@xpple.dev> Message-ID: Hi, at some point we considered doing something for comments. We even tried interfacing with Clang API, but then realized it was just for Doxigen-style comments. While this allows good integration with jextract, we found that almost all the libraries we looked at in our experiments were _not_ using Doxigen comments. For that reason we decided to sit on it (and work on other priorities at the time). I think now would be a good time to reopen that discussion. Clang has a relatively rich API -- for each "cursor" it gives you the position in the original header. So it might be possible even to scrape the header file for information, and copy and past any comment-like line into the generated output -- although that might be significantly more complex that reading Doxigen comments using the API. So, in a way, the question we should ask ourselves is -- what constitutes a comment? What kind of comments do we want jextract to support? Would developers feel Doxigen support is a glass half-empty, or half-full? (Having said all this, given Clang has a nice API to get Doxigen stuff, at least not dropping that info on the floor feels to me like a good starting point -- we could always do more later if we feel it's important). Cheers Maurizio On 21/01/2026 13:46, jextract at xpple.dev wrote: > Hello, > > I have been using jextract for a while now, and my experience has been > great overall. The only thing that bothered me when using the > generated bindings was their lack of documentation. Then I thought: > would it be possible to transcribe the documentation of the symbols in > the C header file into the generated Java code? Since C and Java have > the same (mostly, anyways) comment syntax, this could even be done in > a super na?ve way by copying the literal strings. You would look at > comment tokens that precede the target symbol (ignoring some > whitespace perhaps) and copy it over. > > Looking at the Clang API, it seems they already associate > documentation comments that belong to a declaration with said > declaration. Clang attaches a `RawComment` to a `Decl` when the > comment is immediately before it and separated only by whitespace. > This could be used to transcribe the comment. Of course, the C > documentation string wouldn't be in JavaDoc format, but even if it's > in Doxygen or any other format, it's _much_ better than nothing at all. > > Is this something that you guys would be interested to look into? If I > find the time I could give it a shot, but given my inexperience with > Clang I'm not sure it would result in the best code :). > > Kind regards, > > > Frederik van der Els > > PS: I saw the recent discussion on "jextract cannot generate portable > code (anymore)" and it's about the exact same issue I reported in > March last year ("Bindings crash on Windows where they would not > before"). It's indeed the case that the mismatch is now made explicit > by crashing, but in my case the C library didn't use any platform > dependent types, so the type wouldn't have been used anyways. I would > love to see a solution for this. From some-java-user-99206970363698485155 at vodafonemail.de Sun Jan 25 22:38:08 2026 From: some-java-user-99206970363698485155 at vodafonemail.de (some-java-user-99206970363698485155 at vodafonemail.de) Date: Sun, 25 Jan 2026 23:38:08 +0100 Subject: jextract cannot generate portable code (anymore) In-Reply-To: <5a010bc5-1ecf-4500-a3e5-1006030ebe68@oracle.com> References: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> <5a010bc5-1ecf-4500-a3e5-1006030ebe68@oracle.com> Message-ID: Thanks to both of you for the additional information on that! While trying to support arbitrary user-defined typedefs might indeed be difficult, would it be possible to at least support the standard fixed-width integer and floating point types (such as `int16_t`)? Maybe with an opt-in flag which preserves these types and generates corresponding constants for them in the generated `...$shared.java` class? It seems Clang does provide the needed information to get the typedef names and not their underlying type. I have created a proof-of-concept which demonstrates this here: https://github.com/openjdk/jextract/pull/299 Hopefully this is useful for you (or anyone else), in case you weren't aware of it already anyway. What do you think? Kind regards Am 05.01.2026 um 13:01 schrieb Jorn Vernee: > > I had a look at what jextract does in the case of int64_t and size_t. > In both cases these are typedefs for another builtin type. The former > is a typedef for `long` and the latter a typedef for `unsigned long`. > jextract uses the underlying type of the typedef to determine which > layout to use, so we end up with C_LONG in both cases. > > While these types are semantically portable, their typedefs may have a > non-portable definition, which jextract expands - like a macro - > during extraction. This problem is similar to this example: > > #ifdef WIN32 > typedef long long my_int; > #else > typedef long my_int; > #endif > > This code is portable in the C sense, but jextract eagerly picks one > of the two branches of this compiler switch when extracting. > > This seems like a tricky issue to workaround. In this case I think > we'd want the type to be 'resolved' at runtime rather than extraction > time, but I don't think we can let jextract collect all the different > definitions of `my_int`, and then pick the right one at runtime. > > Jorn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Jan 26 11:05:49 2026 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 26 Jan 2026 11:05:49 +0000 Subject: jextract cannot generate portable code (anymore) In-Reply-To: References: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> <5a010bc5-1ecf-4500-a3e5-1006030ebe68@oracle.com> Message-ID: <91f9fcc2-42e4-4d4c-a99b-fbaae518f4f9@oracle.com> Hi, your solution seems good -- e.g. using canonical layout with type names like int16_t is the right direction. The problem is that doing so doesn't yet get us out of the woods -- because there's still C_LONG to worry about. In your patch, you seem to address this by "downgrading" C_LONG to a ValueLayout. This obviously works, but if we did this for every binding, existing jextracted code using C_LONG would no longer work -- because now the layout of C_LONG would not be sharp enough to allow you to use it inside e.g. a MemorySegment::get call. I think some other tactic is needed here -- either a flag that disables/filters generation of primitive layouts (these are not defined in any header, so filtering at the moment is not really possible). Or adding some kind of magic auto-filtering (e.g. if C_LONG is not used anywhere, just omit it from the bindings). The first option is probably the simplest -- we already have many "--include-XYZ" -- maybe also adding "--include-builtin=long" would be ok. Although, this might create incompatibities in cases where users have already saved the include list somewhere -- e.g. https://github.com/manuelbl/JavaDoesUSB/blob/main/java-does-usb/jextract/linux/gen_linux.sh E.g. currently jextract users assume that builtin types will _always_ be emitted. The change described above would alter that. Maurizio On 25/01/2026 22:38, some-java-user-99206970363698485155 at vodafonemail.de wrote: > > Thanks to both of you for the additional information on that! > > While trying to support arbitrary user-defined typedefs might indeed > be difficult, would it be possible to at least support the standard > fixed-width integer and floating point types (such as `int16_t`)? > Maybe with an opt-in flag which preserves these types and generates > corresponding constants for them in the generated `...$shared.java` class? > > It seems Clang does provide the needed information to get the typedef > names and not their underlying type. I have created a proof-of-concept > which demonstrates this here: https://github.com/openjdk/jextract/pull/299 > Hopefully this is useful for you (or anyone else), in case you weren't > aware of it already anyway. > > What do you think? > > Kind regards > > Am 05.01.2026 um 13:01 schrieb Jorn Vernee: >> >> I had a look at what jextract does in the case of int64_t and size_t. >> In both cases these are typedefs for another builtin type. The former >> is a typedef for `long` and the latter a typedef for `unsigned long`. >> jextract uses the underlying type of the typedef to determine which >> layout to use, so we end up with C_LONG in both cases. >> >> While these types are semantically portable, their typedefs may have >> a non-portable definition, which jextract expands - like a macro - >> during extraction. This problem is similar to this example: >> >> #ifdef WIN32 >> typedef long long my_int; >> #else >> typedef long my_int; >> #endif >> >> This code is portable in the C sense, but jextract eagerly picks one >> of the two branches of this compiler switch when extracting. >> >> This seems like a tricky issue to workaround. In this case I think >> we'd want the type to be 'resolved' at runtime rather than extraction >> time, but I don't think we can let jextract collect all the different >> definitions of `my_int`, and then pick the right one at runtime. >> >> Jorn >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From some-java-user-99206970363698485155 at vodafonemail.de Tue Jan 27 19:58:59 2026 From: some-java-user-99206970363698485155 at vodafonemail.de (some-java-user-99206970363698485155 at vodafonemail.de) Date: Tue, 27 Jan 2026 20:58:59 +0100 Subject: jextract cannot generate portable code (anymore) In-Reply-To: <91f9fcc2-42e4-4d4c-a99b-fbaae518f4f9@oracle.com> References: <7f755534-64eb-42f5-bd6e-b0c8371f992f@vodafonemail.de> <5a010bc5-1ecf-4500-a3e5-1006030ebe68@oracle.com> <91f9fcc2-42e4-4d4c-a99b-fbaae518f4f9@oracle.com> Message-ID: <6ca9f83c-2c28-487f-996d-31285120bc7a@vodafonemail.de> > The first option is probably the simplest -- we already have many > "--include-XYZ" -- maybe also adding "--include-builtin=long" would be > ok. Although, this might create incompatibities in cases where users > have already saved the include list somewhere -- e.g. > > https://github.com/manuelbl/JavaDoesUSB/blob/main/java-does-usb/jextract/linux/gen_linux.sh > > E.g. currently jextract users assume that builtin types will _always_ > be emitted. The change described above would alter that. > One option to maintain backward compatibility would be to still emit the builtin types by default (maybe in addition to the new `..._t` types), and only if `--include-builtin` is explicitly specified limit the builtin types to those listed by that command line argument.