[foreign] RFC: Additional jextract filtering
Jorn Vernee
jbvernee at xs4all.nl
Tue Mar 12 18:28:28 UTC 2019
From your response I feel we are getting to the same wavelength :-)
> Here's a possible sketch:
>
> 1) collect all function symbols in a given shared library
>
> 2) collect the set of headers H defining the functions in (1)
>
> 3) for each headerfile h in H, repeat these steps until the set H is
> stable
>
> b) for all function symbols, scan the signature of the function and
> pull in extra headers in H
> c) for all struct symbols, scan the struct field signatures and pull
> in extra headers in H
>
> 4) add all symbols in H to the result set (including macros, enums,
> typedefs, ...)
>
>
> What do you think?
1. is an interesting option, but it might not be possible to implement
on all needed platforms. While all operating system seem to have some
API for looking up a function in a shared library by name, there doesn't
appear to be an API in POSIX or Windows for collecting all the symbols
in a library. The prototype I have for Windows requires loading the
library image and then crawling the export table. Some OSes might not
support that approach.
Also, wouldn't 3b and 3c still pull in too much? e.g. if I declare a
field with uintptr_t this would still pull in all of stdint.h? Isn't
that basically the same as what we have now?
We could instead:
1.) Start from the set of all functions and global vars transitively
included in the headers passed to jextract.
2.) Filter out any symbols that don't appear in the shared libraries.
3.) From the remaining symbols compute a header root set H.
4.) Include anything found in the headers in H.
5.) Also include elements (but not entire headers) that are required by
something appearing in H.
How does that sound?
Maybe at some point we could drop 5. and replace it with the need to
pass an explicit dependency.
Also, probably, we also need a way of forcing a header into H, with an
option. e.g. if a header only contains macros (i.e. a 'macro-header') it
would always get filtered out otherwise.
---
Also, I have the next iteration of the prototype, which now includes the
path-based filtering:
http://cr.openjdk.java.net/~jvernee/panama/webrevs/filters/webrev.01/
If I run that over a header file with the following:
```
#include <stdint.h>
uintptr_t x = 10;
```
The output only contains a class with the global var, and a class with
only the typedef annotation for uintptr_t. typdefs are currently also
'required', but they could be dropped of course. But, it's nice that
this seems to "just work" for all our tests. They all pass.
Jorn
Maurizio Cimadamore schreef op 2019-03-12 18:34:
> <snip>
>> I think it's also important to keep the heuristic simple. Which
>> hopefully also makes it easy and straightforward to manipulate. What I
>> like about the current approach is that it's so straight forward. You
>> give jextract a header file, and get back a complete binding. If you
>> want to make it smaller you could add filters.
>>
>> I think some sort of automatic filtering (guess) would also need the
>> ability to be turned off.
> Yes on both points. On simplicity, I think it's important not just in
> terms of implementation, but also in pedagogical terms (how hard it is
> to explain what jextract does?)
>> <snip>
>
>> Well, you need more than just functions and global variables to use a
>> library. In practice the header file gives what you need. We just need
>> to find a good heuristic for filtering out the noise that comes with
>> it. (I guess that problem also exists in the C/C++ world, maybe it's
>> interesting to look at some solutions there?)
>>
>> Starting from the list of library symbols seems like and interesting
>> idea to minimize the output, but imho the header file is the more
>> trustworthy source to draw information from.
> The problem with headers is that they include other headers and so all
> header-based approaches will have, at some point, to ask: when do I
> stop following dependencies?
>>
>> But, I think what we can definitely agree on is that a jextract
>> transitively including a bunch of system headers in the output is
>> undesirable.
> Right - and again, while it might be simple to explain _why_ a certain
> header has been pulled in, it could be totally surprising for an user
> to see so many symbols being pulled in for even relatively simple
> libraries.
> <snip>
>> I don't feel so strongly about the dependency analysis. I think it
>> falls short in too many cases, especially when we already have a
>> hand-crafted set of dependencies, i.e. header files. I think the
>> dependency analysis should really only be used to emit warnings or
>> errors when things that are needed to generate a well-formed artifact
>> are missing, and let the user decide how to deal with the problem.
>> Though, before we have a good mechanism for including dependencies for
>> jextract runs, it seems fine to automatically include the dependencies
>> of the root set as well.
>>
>> The path-based filtering to determine a root set seems interesting to
>> explore. It should be an overridable default imho. I'll continue
>> exploring that.
>
> Maybe we're using different terms - by dependency analysis I mean
> finding some root set of symbols to extract, and then use some
> analysis to pull in the symbols that will be required at runtime.
>
> Here's a possible sketch:
>
> 1) collect all function symbols in a given shared library
>
> 2) collect the set of headers H defining the functions in (1)
>
> 3) for each headerfile h in H, repeat these steps until the set H is
> stable
>
> b) for all function symbols, scan the signature of the function and
> pull in extra headers in H
> c) for all struct symbols, scan the struct field signatures and pull
> in extra headers in H
>
> 4) add all symbols in H to the result set (including macros, enums,
> typedefs, ...)
>
>
> What do you think?
>
> Maurizio
>
>
>> Jorn
>>
>>>
>>> Maurizio
>>>
>>>
>>>>
>>>> Jorn
>>>>
>>>>> Maurizio
>>>>>
>>>>>>
>>>>>> On the other hand, not everything makes sense to use from a Panama
>>>>>> perspective, so we still need some escape hatch to filter out some
>>>>>> stuff we can't use, or breaks the binder. But, we'd like to go
>>>>>> about that disciplined, and make sure we don't filter out things
>>>>>> that are required by other things, so we use a dependency set.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-03-11 16:12:
>>>>>>> On 11/03/2019 13:45, Jorn Vernee wrote:
>>>>>>>> I can separate the parts of the patch a little bit into; Filter
>>>>>>>> refactor + root set compute, and then leave the option changes
>>>>>>>> out of it. But those 2 alone do not affect the filtering, since
>>>>>>>> the root set is only used when filtering non-symbol/macro
>>>>>>>> elements.
>>>>>>>
>>>>>>> I guess then what I'm suggesting is to automatically filter out
>>>>>>> elements not in the root set, and see how that works out.
>>>>>>>
>>>>>>> Maurizio
More information about the panama-dev
mailing list