<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Thanks a lot for all this information! Initially I thought this
could really just be solved by changing the default Linker
downcall behavior, but you are right that this would not help in
all cases, and might break existing use cases where the downcall
result is newly allocated memory, and therefore usage of the
global scope is fine.</p>
<p>
<blockquote type="cite">Can you please share more about your use
case?</blockquote>
The use case is <a class="moz-txt-link-freetext" href="https://github.com/tree-sitter/java-tree-sitter">https://github.com/tree-sitter/java-tree-sitter</a>
(jtreesitter), but note that I am just a user of it, not a
maintainer. And that I am just starting to get more familiar with
it.<br>
In general Tree-sitter works like this (I hope I am noting saying
something wrong):</p>
<ul>
<li>You have the library shared by all parsers, see
<a class="moz-txt-link-freetext" href="https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h">https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h</a>
(and corresponding C code)<br>
</li>
<li>You have specific parser implementations (whose C code is
generated by the tree-sitter CLI), for example
<a class="moz-txt-link-freetext" href="https://github.com/tree-sitter/tree-sitter-json/blob/master/src/parser.c">https://github.com/tree-sitter/tree-sitter-json/blob/master/src/parser.c</a></li>
</ul>
<p>The parser implementations actually only have a single public
function `tree_sitter_...`, which returns a pointer to a
TSLanguage struct with a fixed binary format expected by the
tree-sitter library. For example for the JSON parser that is
`const TSLanguage *tree_sitter_json(void) { ... }`. The connection
between the library and a specific parser happens through this
`tree_sitter_...` function: You call it to load the TSLanguage
struct, then pass it to the library and afterwards can start using
the library with that specific parser. See also
<a class="moz-txt-link-freetext" href="https://tree-sitter.github.io/tree-sitter/using-parsers">https://tree-sitter.github.io/tree-sitter/using-parsers</a><br>
</p>
<p>The jtreesitter project provides a wrapper around the tree-sitter
library. In jtreesitter this connection between general
tree-sitter library and parser happens with a MemorySegment in the
Language class[1], representing the TSLanguage pointer.<br>
However, manually having to lookup the `tree_sitter_...`
function, obtain the Linker, create the function descriptor and
perform the downcall seemed a bit error-prone (especially for
users new to the foreign API), and repetitive because it is the
same for all parser implementations, I think (`tree_sitter_json`,
`tree_sitter_python`, ...). Therefore I suggested to create a
convenience method which only takes a SymbolLookup, which was then
implemented by the maintainer[2].</p>
<p>But only while using jtreesitter more I noticed this issue, that
the TSLanguage pointer actually points to static data, and if you
unload the library while you are still using it, the JVM crashes.
It turns out this is partially due to an incorrect (?)
`reinterpret` call, but also due to the Linker using the global
scope for the result, see also [3]. I am suggesting there now to
not only take a SymbolLookup but also the related Arena, to be
able to set the Arena on the TSLanguage pointer to fix its scope.<br>
</p>
<p>So maybe this is really a niche use case, and it can be solved by
also separately providing the Arena (though having to provide it
separately is a bit inconvenient).</p>
<p>As for why I am not simply using `Arena.global()` when loading
the parser library: I was thinking more about the case where you
accidentally not use the global Arena, and was hoping for it to
still be (relatively) safe when the Arena is closed too early,
throwing an exception in the worst case, but not crashing the JVM.
And also a bit similar to my previous message in
<a class="moz-txt-link-freetext" href="https://mail.openjdk.org/pipermail/jextract-dev/2024-August/001919.html">https://mail.openjdk.org/pipermail/jextract-dev/2024-August/001919.html</a>,
I am still exploring ways to use a temporary file for the library
and be able to unload and delete it before JVM exit (therefore not
use `Arena.global()`).</p>
<p>But you are right, some of my concerns were also a bit
theoretical, extending from the issues I actually encountered.<br>
</p>
<p><br>
</p>
<p>I hope this topic did not cause too much distraction, and it
would likely be good to see if other use cases actually benefit
from JDK-8340641 (reinterpret with Scope) and JDK-8340642 (scope
from SymbolLookup) as well. Maybe they are really not needed, and
users of the foreign API can easily work around this or adjust
their API slightly to accommodate for this.<br>
Thanks nonetheless for investigating this and thinking about
possible solutions.<br>
</p>
<p><br>
</p>
<p>Kind regards<br>
</p>
<p><br>
</p>
<p>[1]:
<a class="moz-txt-link-freetext" href="https://tree-sitter.github.io/java-tree-sitter/io/github/treesitter/jtreesitter/Language.html#%3Cinit%3E(java.lang.foreign.MemorySegment)">https://tree-sitter.github.io/java-tree-sitter/io/github/treesitter/jtreesitter/Language.html#%3Cinit%3E(java.lang.foreign.MemorySegment)</a><br>
[2]:
<a class="moz-txt-link-freetext" href="https://github.com/tree-sitter/java-tree-sitter/commit/078a1c2b7c7ad823850444c3ba35c47379c5bd46">https://github.com/tree-sitter/java-tree-sitter/commit/078a1c2b7c7ad823850444c3ba35c47379c5bd46</a><br>
[3]: <a class="moz-txt-link-freetext" href="https://github.com/tree-sitter/java-tree-sitter/pull/37">https://github.com/tree-sitter/java-tree-sitter/pull/37</a><br>
</p>
<p><br>
</p>
</body>
</html>