Using SymbolLookup#libraryLookup with fallback of SymbolLookup#loaderLookup

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Oct 8 09:44:43 UTC 2024


Hi,
I just wanted to clarify this point a bit.

What comes out of jextract is a low-level binding of a C API. As such, 
it only ever contains "static methods" (or, instance methods accessible 
via the trick described in your [^1]). JTreesitter, Parser - they are 
*high-level* bindings - they are libraries that are built on top of 
jextract bindings, to provide additional ease of use.

So I think an important question here is to understand where should the 
"custom loading" live. IIRC from our previous discussions, tree sitter 
is an API which works off its own native library, but there can be 
parser plugins that are implemented in user-defined native libraries. 
Such plugins will always define a certain function (which effectively 
acts as the interface between the plugin and the tree sitter API).

Given this set up, a very minimal/low-level tree sitter API would simply 
accept a memory segment for given TSLanguage plugin - and leave the 
responsibility for looking things up to clients.

You can then go high-level by making the language lookup part of the 
high-level bindings. Now the high-level binding will take a SymbolLookup 
(not a memory segment), and will do the necessary work to lookup the 
"tree_sitter_xyz" function, invoke it, and obtain the desired language 
segment. But obtaining the correct SymbolLookup is still a client 
responsability - after all, the language plugin lives in a library 
controlled by the client, not by the tree sitter library. From our 
previous discussion, what I just wrote doesn't seem miles off where the 
Java tree sitter bindings already are:

https://tree-sitter.github.io/java-tree-sitter/io/github/treesitter/jtreesitter/Language.html#%3Cinit%3E(java.lang.foreign.MemorySegment)

It seems to me that you are after some way for the tree sitter 
high-level bindings to omit the SymbolLookup argument - so that plugins 
are searched using some strategies defined by the tree sitter library - 
not the client. If you want to go down this path, it seems to me that 
you have to define what this library search really looks like, and how 
it can be customized in case it goes wrong. Think of it as something 
that takes a language "name" and gives you back a Path where the library 
is defined. That's the hard part (as it will likely contain heuristics 
that are platform-specific). Once you have this magic function, it's 
easy to wrap the resulting Path in a (new!) symbol lookup (maybe a 
library lookup backed by the same Arena as the tree sitter bindings 
themselves, which would allow you to also address the problems you 
brough up in [2]).

It seems to me you are asking symbol lookup to implement a "good" 
library search algorithm, packed with lots of smart (platform-specific) 
heuristics. Alas, SymbolLookup, as the name implies, is for finding 
_symbols_ in libraries not _libraries_ themselves. As such, the solution 
is not (IMHO) to put whatever library search is required in your case 
inside a single SymbolLookup object. The solution is to code up the 
library search in the tree sitter library itself and document it 
(assuming you want to go down that path).

Hope this helps

Maurizio

[2] - 
https://mail.openjdk.org/pipermail/panama-dev/2024-September/020635.html

On 08/10/2024 00:08, some-java-user-99206970363698485155 at vodafonemail.de 
wrote:
> My concern here is that this mostly works for bindings which expose 
> native methods as static Java methods, e.g. OpenGL [^1]. However, for 
> bindings such as jtreesitter this might require larger API changes 
> because it currently uses constructors to create binding objects (e.g. 
> `new Parser(...)`). If jextract supported providing a custom 
> SymbolLookup, then it would require a factory class and factory 
> methods which use that lookup, e.g. `var jtreesitter = new 
> JTreesitter(symbolLookup); var parser = jtressitter.newParser(...)`. 
> That might be the cleanest approach but would require some (larger) 
> refactoring for existing code.
> An alternative might be a `static volatile SymbolLookup` field which 
> is lazily initialized and can be overwritten (but only if not 
> overwritten / initialized yet).


More information about the jextract-dev mailing list