[foreign] RFR 8220544: Jextract; library dependent header filtering
Jorn Vernee
jbvernee at xs4all.nl
Tue Mar 19 13:50:35 UTC 2019
Suggestions sound good.
Here's the updated webrev:
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
* Split library based filtering code off from SymbolFilter into
LibraryLookupFilter.
* Merged DependencyFinder & ElementFilter into DependencyFilter (as
nested classes).
* Also added 2 more test cases to check that symbols being filtered out
don't have their dependencies added.
Jorn
Maurizio Cimadamore schreef op 2019-03-19 13:40:
> Looks good, I did some tests and it seems to behave according to the
> expectations - some comments below:
>
> On 18/03/2019 10:58, Jorn Vernee wrote:
>> Hi,
>>
>> I have implemented this:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>
>> This adds 2 options --include-headers and --exclude-headers that take
>> regular expression patterns as an argument. The patterns are matched
>> against a cursor's full header file path to determine if it appears in
>> the header root set.
>>
>> For example, I can use the following to include headers in the Python
>> include director;
>>
>> --include-headers "J:\\Python27\\include\\.*"
>>
>> ---
>>
>> To recap, we now:
>> 1.) Filter functions, vars, and macros by whether they appear in a
>> root header.
>> 2.) Filter functions, vars, and macros by the
>> --includes/exclude-symbols patterns.
>> 3.) Filter functions and vars, by whether they appear in any shared
>> library.
>
> I think this is ok, but the way this happens in the code seems to be
> sort of ad hoc, and probably the result of how the code use to be in
> previous rounds.
>
> For instance, SymbolFilter does both (1), (2) and (3).
>
> Then we have DependencyFinder which also does a bit of (1).
>
> Finally we have ElementFilter, which drops everything that is not
> required (as per 4, 5).
>
>
> I believe the code could be improved by:
>
> * separating the library-based filtering from the option-driven
> filtering. These are two separate mechanism (the latter being optional
> on -L being used at all). Of course we should also decide which is
> used first.
>
> * bring together DependencyFinder and ElementFilter into
> DependencyFilter. You can have a two pass visitor - collect deps on a
> first pass, then prune on the second pass. This way the state remains
> 'internal' to the DependencyFilter visitor, no need to pass magic
> state between two passes.
>
> Maurizio
>
>
>> 4.) For the remaining trees in the root headers do dependency
>> analysis, and find the required structs, enums, and typedefs.
>> 5.) For structs, enums, and typedefs include them if they appear in
>> a root header OR are required as dependency.
>>
>> Cheers,
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>
>>>> I think this is a good idea! If you don't think this adds to much
>>>> complexity to the options?
>>>>
>>>> So we'd basically have:
>>>>
>>>> Without --include/exclude-header -> same behavior we have right
>>>> now.
>>>> With --include/exclude-header -> Add included headers to root set
>>>> + find any dependencies, and put both in the generated artifact.
>>>>
>>>> I like that it does no filtering by default, so ad-hoc users don't
>>>> have to figure out which header files define which functions.
>>>
>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>> adds the filter option that makes most sense when you want to chop
>>> down an API for good. This is a filtering mechanism after all, the
>>> fact that it's coming back as a filtering option is IMHO, a plus, no
>>> a
>>> minus.
>>>
>>> Btw, slight revision to what I said previously - let's make the
>>> argument of include/exclude regexes too (as for others). This way it
>>> can be even easier to use, most headers do have some commonalities in
>>> them - and I like that, with that, we can basically do the
>>> path-based
>>> heuristics w/o hardwiring that into jextract.
>>>
>>> Maurizio
>>>
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>
>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>> I've implemented this, and now doing a clean build based on the
>>>>>>> latest jdk/jdk merge before submitting the next webrev.
>>>>>>>
>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>> pre-processor code defined in a parent header. For instance,
>>>>>>> Python.h defines some pre-processor code used in pythonrun.h .
>>>>>>> The example we have only uses functions defined in pythonrun.h,
>>>>>>> but if we just pass that header to jextract Clang throws an error
>>>>>>> because of the missing pre-processor code from Python.h
>>>>>>>
>>>>>>> In this case we have to pass Python.h first, and then pythonrun.h
>>>>>>> to get everything to work. This also relies on the existence of
>>>>>>> header include guards in the pythonrun.h header (since we're
>>>>>>> basically including it twice). A similar caveat exists with the
>>>>>>> Windows API.
>>>>>>
>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>
>>>>>> What if the root set was a 'first class' concept in the extraction
>>>>>> run, rather than something we infer from this or than command line
>>>>>> option?
>>>>>>
>>>>>> That way we could point jextract at python.h, but then say "hey,
>>>>>> I'm only really interested at stuff that comes from pythonrun.h".
>>>>>
>>>>> In other words, right now we have
>>>>>
>>>>> --include-symbol
>>>>>
>>>>> and
>>>>>
>>>>> --exclude-symbol
>>>>>
>>>>> If not set, everything is included.
>>>>>
>>>>> Maybe all we need is:
>>>>>
>>>>> --include-header <header>
>>>>>
>>>>> --exclude-header <header>
>>>>>
>>>>> And, again, if none is specified, all headers are part of the 'root
>>>>> set'.
>>>>>
>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>> 'include path' for header names in --include/exclude-header
>>>>>
>>>>> Maurizio
>>>>>
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>>
>>>>>>> So, the guide-line is: Pass the main header first, then internal
>>>>>>> headers. e.g. If we have a main header A.h which includes
>>>>>>> a_impl.h, and another main header B.h which includes b_impl.h,
>>>>>>> headers should be passed to jextract in the order: A.h a_impl.h
>>>>>>> B.h b_impl.h
>>>>>>>
>>>>>>> I think the behavior being dependent on the ordering of the
>>>>>>> headers could be fixed by sorting the headers in topological
>>>>>>> order. But, there's still the requirement to pass all of them, or
>>>>>>> things break.
>>>>>>>
>>>>>>> Any thoughs about that?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jorn
>>>>>>>
>>>>>>>
>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>> jextract
>>>>>>>> (except for simple examples). I find it very useful to be able
>>>>>>>> to
>>>>>>>> split the command over multiple lines, especially long file
>>>>>>>> paths
>>>>>>>> become much more readable.
>>>>>>>>
>>>>>>>> I'll start working on this then.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>> I still like this approach, and I think adding support for
>>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>>
>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract is
>>>>>>>>>> what you get. Though, we could still apply dependency
>>>>>>>>>> analysis, which would make sure nothing that's needed gets
>>>>>>>>>> dropped.
>>>>>>>>>
>>>>>>>>> Why don't we try this then?
>>>>>>>>>
>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>
>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>
>>>>>>>>> which works
>>>>>>>>>
>>>>>>>>> also, note that jextract also accepts a file with the @ syntax,
>>>>>>>>> etc.
>>>>>>>>>
>>>>>>>>> jextract @args.txt
>>>>>>>>>
>>>>>>>>> where args.txt is the command line (which could list all the
>>>>>>>>> headers you want!).
>>>>>>>>>
>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>
>>>>>>>>> Maurizio
More information about the panama-dev
mailing list