[foreign] RFC: Additional jextract filtering
Jorn Vernee
jbvernee at xs4all.nl
Tue Mar 12 21:29:47 UTC 2019
Maurizio Cimadamore schreef op 2019-03-12 20:13:
> On 12/03/2019 18:28, Jorn Vernee wrote:
>> From your response I feel we are getting to the same wavelength :-)
>>
>>> Here's a possible sketch:
>>>
>>> 1) collect all function symbols in a given shared library
>>>
>>> 2) collect the set of headers H defining the functions in (1)
>>>
>>> 3) for each headerfile h in H, repeat these steps until the set H is
>>> stable
>>>
>>> b) for all function symbols, scan the signature of the function and
>>> pull in extra headers in H
>>> c) for all struct symbols, scan the struct field signatures and pull
>>> in extra headers in H
>>>
>>> 4) add all symbols in H to the result set (including macros, enums,
>>> typedefs, ...)
>>>
>>>
>>> What do you think?
>>
>> 1. is an interesting option, but it might not be possible to implement
>> on all needed platforms. While all operating system seem to have some
>> API for looking up a function in a shared library by name, there
>> doesn't appear to be an API in POSIX or Windows for collecting all the
>> symbols in a library. The prototype I have for Windows requires
>> loading the library image and then crawling the export table. Some
>> OSes might not support that approach.
>>
>> Also, wouldn't 3b and 3c still pull in too much? e.g. if I declare a
>> field with uintptr_t this would still pull in all of stdint.h? Isn't
>> that basically the same as what we have now?
>>
>> We could instead:
>>
>> 1.) Start from the set of all functions and global vars transitively
>> included in the headers passed to jextract.
>>
>> 2.) Filter out any symbols that don't appear in the shared libraries.
>>
>> 3.) From the remaining symbols compute a header root set H.
>>
>> 4.) Include anything found in the headers in H.
>>
>> 5.) Also include elements (but not entire headers) that are required
>> by something appearing in H.
>>
>> How does that sound?
>
> This sounds close to what I had in mind - my step (1) is morally
> equivalent to your 1 + 2
>
> 4 and 5 sounds good. I agree that my (4) was pulling it too much
Ok. Then I'll make a ticket at this point and start working on
implementing this + tests.
>>
>> Maybe at some point we could drop 5. and replace it with the need to
>> pass an explicit dependency.
>>
>> Also, probably, we also need a way of forcing a header into H, with an
>> option. e.g. if a header only contains macros (i.e. a 'macro-header')
>> it would always get filtered out otherwise.
> Yeah - maybe everything passed on the command line is implicitly added
> to H?
Yeah, was thinking the same.
>>
>> ---
>>
>> Also, I have the next iteration of the prototype, which now includes
>> the path-based filtering:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/filters/webrev.01/
>>
>> If I run that over a header file with the following:
>>
>> ```
>> #include <stdint.h>
>>
>> uintptr_t x = 10;
>> ```
>>
>> The output only contains a class with the global var, and a class with
>> only the typedef annotation for uintptr_t. typdefs are currently also
>> 'required', but they could be dropped of course. But, it's nice that
>> this seems to "just work" for all our tests. They all pass.
>
> If I understand correctly, this is the same as your previous patch -
> you do the dependency analysis and then you use the result to avoid
> filtering out too much stuff (e.g. something not on the right path
> might be needed after all). If so, this looks a promising start; I'd
> suggest to put together a simpler webrev with just these changes, for
> ease of review.
Well, I also removed the filtering options I added :) I kept a lot of
the refactoring though, especially the separation between library
symbols (i.e. functions and vars) and macros is needed. I'll remove the
other filters for now.
Jorn
> Thanks
> Maurizio
>
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-03-12 18:34:
>>> <snip>
>>>> I think it's also important to keep the heuristic simple. Which
>>>> hopefully also makes it easy and straightforward to manipulate. What
>>>> I like about the current approach is that it's so straight forward.
>>>> You give jextract a header file, and get back a complete binding. If
>>>> you want to make it smaller you could add filters.
>>>>
>>>> I think some sort of automatic filtering (guess) would also need the
>>>> ability to be turned off.
>>> Yes on both points. On simplicity, I think it's important not just in
>>> terms of implementation, but also in pedagogical terms (how hard it
>>> is
>>> to explain what jextract does?)
>>>> <snip>
>>>
>>>> Well, you need more than just functions and global variables to use
>>>> a library. In practice the header file gives what you need. We just
>>>> need to find a good heuristic for filtering out the noise that comes
>>>> with it. (I guess that problem also exists in the C/C++ world, maybe
>>>> it's interesting to look at some solutions there?)
>>>>
>>>> Starting from the list of library symbols seems like and interesting
>>>> idea to minimize the output, but imho the header file is the more
>>>> trustworthy source to draw information from.
>>> The problem with headers is that they include other headers and so
>>> all
>>> header-based approaches will have, at some point, to ask: when do I
>>> stop following dependencies?
>>>>
>>>> But, I think what we can definitely agree on is that a jextract
>>>> transitively including a bunch of system headers in the output is
>>>> undesirable.
>>> Right - and again, while it might be simple to explain _why_ a
>>> certain
>>> header has been pulled in, it could be totally surprising for an user
>>> to see so many symbols being pulled in for even relatively simple
>>> libraries.
>>> <snip>
>>>> I don't feel so strongly about the dependency analysis. I think it
>>>> falls short in too many cases, especially when we already have a
>>>> hand-crafted set of dependencies, i.e. header files. I think the
>>>> dependency analysis should really only be used to emit warnings or
>>>> errors when things that are needed to generate a well-formed
>>>> artifact are missing, and let the user decide how to deal with the
>>>> problem. Though, before we have a good mechanism for including
>>>> dependencies for jextract runs, it seems fine to automatically
>>>> include the dependencies of the root set as well.
>>>>
>>>> The path-based filtering to determine a root set seems interesting
>>>> to explore. It should be an overridable default imho. I'll continue
>>>> exploring that.
>>>
>>> Maybe we're using different terms - by dependency analysis I mean
>>> finding some root set of symbols to extract, and then use some
>>> analysis to pull in the symbols that will be required at runtime.
>>>
>>> Here's a possible sketch:
>>>
>>> 1) collect all function symbols in a given shared library
>>>
>>> 2) collect the set of headers H defining the functions in (1)
>>>
>>> 3) for each headerfile h in H, repeat these steps until the set H is
>>> stable
>>>
>>> b) for all function symbols, scan the signature of the function and
>>> pull in extra headers in H
>>> c) for all struct symbols, scan the struct field signatures and pull
>>> in extra headers in H
>>>
>>> 4) add all symbols in H to the result set (including macros, enums,
>>> typedefs, ...)
>>>
>>>
>>> What do you think?
>>>
>>> Maurizio
>>>
>>>
>>>> Jorn
>>>>
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>>
>>>>>>>> On the other hand, not everything makes sense to use from a
>>>>>>>> Panama perspective, so we still need some escape hatch to filter
>>>>>>>> out some stuff we can't use, or breaks the binder. But, we'd
>>>>>>>> like to go about that disciplined, and make sure we don't filter
>>>>>>>> out things that are required by other things, so we use a
>>>>>>>> dependency set.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-11 16:12:
>>>>>>>>> On 11/03/2019 13:45, Jorn Vernee wrote:
>>>>>>>>>> I can separate the parts of the patch a little bit into;
>>>>>>>>>> Filter refactor + root set compute, and then leave the option
>>>>>>>>>> changes out of it. But those 2 alone do not affect the
>>>>>>>>>> filtering, since the root set is only used when filtering
>>>>>>>>>> non-symbol/macro elements.
>>>>>>>>>
>>>>>>>>> I guess then what I'm suggesting is to automatically filter out
>>>>>>>>> elements not in the root set, and see how that works out.
>>>>>>>>>
>>>>>>>>> Maurizio
More information about the panama-dev
mailing list