[foreign] RFC: Additional jextract filtering

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Mar 12 17:34:20 UTC 2019


<snip>
> I think it's also important to keep the heuristic simple. Which 
> hopefully also makes it easy and straightforward to manipulate. What I 
> like about the current approach is that it's so straight forward. You 
> give jextract a header file, and get back a complete binding. If you 
> want to make it smaller you could add filters.
>
> I think some sort of automatic filtering (guess) would also need the 
> ability to be turned off.
Yes on both points. On simplicity, I think it's important not just in 
terms of implementation, but also in pedagogical terms (how hard it is 
to explain what jextract does?)
> <snip>

> Well, you need more than just functions and global variables to use a 
> library. In practice the header file gives what you need. We just need 
> to find a good heuristic for filtering out the noise that comes with 
> it. (I guess that problem also exists in the C/C++ world, maybe it's 
> interesting to look at some solutions there?)
>
> Starting from the list of library symbols seems like and interesting 
> idea to minimize the output, but imho the header file is the more 
> trustworthy source to draw information from.
The problem with headers is that they include other headers and so all 
header-based approaches will have, at some point, to ask: when do I stop 
following dependencies?
>
> But, I think what we can definitely agree on is that a jextract 
> transitively including a bunch of system headers in the output is 
> undesirable.
Right - and again, while it might be simple to explain _why_ a certain 
header has been pulled in, it could be totally surprising for an user to 
see so many symbols being pulled in for even relatively simple libraries.
<snip>
> I don't feel so strongly about the dependency analysis. I think it 
> falls short in too many cases, especially when we already have a 
> hand-crafted set of dependencies, i.e. header files. I think the 
> dependency analysis should really only be used to emit warnings or 
> errors when things that are needed to generate a well-formed artifact 
> are missing, and let the user decide how to deal with the problem. 
> Though, before we have a good mechanism for including dependencies for 
> jextract runs, it seems fine to automatically include the dependencies 
> of the root set as well.
>
> The path-based filtering to determine a root set seems interesting to 
> explore. It should be an overridable default imho. I'll continue 
> exploring that.

Maybe we're using different terms - by dependency analysis I mean 
finding some root set of symbols to extract, and then use some analysis 
to pull in the symbols that will be required at runtime.

Here's a possible sketch:

1) collect all function symbols in a given shared library

2) collect the set of headers H defining the functions in (1)

3) for each headerfile h in H, repeat these steps until the set H is stable

b) for all function symbols, scan the signature of the function and pull 
in extra headers in H
c) for all struct symbols, scan the struct field signatures and pull in 
extra headers in H

4) add all symbols in H to the result set (including macros, enums, 
typedefs, ...)


What do you think?

Maurizio


> Jorn
>
>>
>> Maurizio
>>
>>
>>>
>>> Jorn
>>>
>>>> Maurizio
>>>>
>>>>>
>>>>> On the other hand, not everything makes sense to use from a Panama 
>>>>> perspective, so we still need some escape hatch to filter out some 
>>>>> stuff we can't use, or breaks the binder. But, we'd like to go 
>>>>> about that disciplined, and make sure we don't filter out things 
>>>>> that are required by other things, so we use a dependency set.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-11 16:12:
>>>>>> On 11/03/2019 13:45, Jorn Vernee wrote:
>>>>>>> I can separate the parts of the patch a little bit into; Filter 
>>>>>>> refactor + root set compute, and then leave the option changes 
>>>>>>> out of it. But those 2 alone do not affect the filtering, since 
>>>>>>> the root set is only used when filtering non-symbol/macro elements.
>>>>>>
>>>>>> I guess then what I'm suggesting is to automatically filter out
>>>>>> elements not in the root set, and see how that works out.
>>>>>>
>>>>>> Maurizio


More information about the panama-dev mailing list