Using SymbolLookup#libraryLookup with fallback of SymbolLookup#loaderLookup
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Oct 8 09:44:43 UTC 2024
Hi,
I just wanted to clarify this point a bit.
What comes out of jextract is a low-level binding of a C API. As such,
it only ever contains "static methods" (or, instance methods accessible
via the trick described in your [^1]). JTreesitter, Parser - they are
*high-level* bindings - they are libraries that are built on top of
jextract bindings, to provide additional ease of use.
So I think an important question here is to understand where should the
"custom loading" live. IIRC from our previous discussions, tree sitter
is an API which works off its own native library, but there can be
parser plugins that are implemented in user-defined native libraries.
Such plugins will always define a certain function (which effectively
acts as the interface between the plugin and the tree sitter API).
Given this set up, a very minimal/low-level tree sitter API would simply
accept a memory segment for given TSLanguage plugin - and leave the
responsibility for looking things up to clients.
You can then go high-level by making the language lookup part of the
high-level bindings. Now the high-level binding will take a SymbolLookup
(not a memory segment), and will do the necessary work to lookup the
"tree_sitter_xyz" function, invoke it, and obtain the desired language
segment. But obtaining the correct SymbolLookup is still a client
responsability - after all, the language plugin lives in a library
controlled by the client, not by the tree sitter library. From our
previous discussion, what I just wrote doesn't seem miles off where the
Java tree sitter bindings already are:
https://tree-sitter.github.io/java-tree-sitter/io/github/treesitter/jtreesitter/Language.html#%3Cinit%3E(java.lang.foreign.MemorySegment)
It seems to me that you are after some way for the tree sitter
high-level bindings to omit the SymbolLookup argument - so that plugins
are searched using some strategies defined by the tree sitter library -
not the client. If you want to go down this path, it seems to me that
you have to define what this library search really looks like, and how
it can be customized in case it goes wrong. Think of it as something
that takes a language "name" and gives you back a Path where the library
is defined. That's the hard part (as it will likely contain heuristics
that are platform-specific). Once you have this magic function, it's
easy to wrap the resulting Path in a (new!) symbol lookup (maybe a
library lookup backed by the same Arena as the tree sitter bindings
themselves, which would allow you to also address the problems you
brough up in [2]).
It seems to me you are asking symbol lookup to implement a "good"
library search algorithm, packed with lots of smart (platform-specific)
heuristics. Alas, SymbolLookup, as the name implies, is for finding
_symbols_ in libraries not _libraries_ themselves. As such, the solution
is not (IMHO) to put whatever library search is required in your case
inside a single SymbolLookup object. The solution is to code up the
library search in the tree sitter library itself and document it
(assuming you want to go down that path).
Hope this helps
Maurizio
[2] -
https://mail.openjdk.org/pipermail/panama-dev/2024-September/020635.html
On 08/10/2024 00:08, some-java-user-99206970363698485155 at vodafonemail.de
wrote:
> My concern here is that this mostly works for bindings which expose
> native methods as static Java methods, e.g. OpenGL [^1]. However, for
> bindings such as jtreesitter this might require larger API changes
> because it currently uses constructors to create binding objects (e.g.
> `new Parser(...)`). If jextract supported providing a custom
> SymbolLookup, then it would require a factory class and factory
> methods which use that lookup, e.g. `var jtreesitter = new
> JTreesitter(symbolLookup); var parser = jtressitter.newParser(...)`.
> That might be the cleanest approach but would require some (larger)
> refactoring for existing code.
> An alternative might be a `static volatile SymbolLookup` field which
> is lazily initialized and can be overwritten (but only if not
> overwritten / initialized yet).
More information about the jextract-dev
mailing list