[foreign] RFR 8220544: Jextract; library dependent header filtering
Jorn Vernee
jbvernee at xs4all.nl
Tue Mar 19 19:58:31 UTC 2019
Thanks for the reviews. I've updated the comment and pushed.
Jorn
Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
> Tests passed. Please go ahead with push.
>
> -Sundar
>
> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
>> * 47 * A merge between {@link DependencyFinder} and {@link
>> ElementFilter}
>>
>> in DependencyFilter. Could be removed?
>>
>> No comments other than that. +1
>>
>> I'm going to submit an internal mach5 build/request for this patch.
>> I'll notify you as soon as it is all green.
>>
>> -Sundar
>>
>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>>> This looks good to me, I have no further comments.
>>>
>>> Great work!
>>>
>>> (I'll leave it to Sundar for further comments/suggestions)
>>>
>>> Maurizio
>>>
>>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>>> Suggestions sound good.
>>>>
>>>> Here's the updated webrev:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>>>
>>>> * Split library based filtering code off from SymbolFilter into
>>>> LibraryLookupFilter.
>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter (as
>>>> nested classes).
>>>> * Also added 2 more test cases to check that symbols being filtered
>>>> out don't have their dependencies added.
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>>> Looks good, I did some tests and it seems to behave according to
>>>>> the
>>>>> expectations - some comments below:
>>>>>
>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have implemented this:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>>>
>>>>>> This adds 2 options --include-headers and --exclude-headers that
>>>>>> take regular expression patterns as an argument. The patterns are
>>>>>> matched against a cursor's full header file path to determine if
>>>>>> it appears in the header root set.
>>>>>>
>>>>>> For example, I can use the following to include headers in the
>>>>>> Python include director;
>>>>>>
>>>>>> --include-headers "J:\\Python27\\include\\.*"
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> To recap, we now:
>>>>>> 1.) Filter functions, vars, and macros by whether they appear in
>>>>>> a root header.
>>>>>> 2.) Filter functions, vars, and macros by the
>>>>>> --includes/exclude-symbols patterns.
>>>>>> 3.) Filter functions and vars, by whether they appear in any
>>>>>> shared library.
>>>>>
>>>>> I think this is ok, but the way this happens in the code seems to
>>>>> be
>>>>> sort of ad hoc, and probably the result of how the code use to be
>>>>> in
>>>>> previous rounds.
>>>>>
>>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>>
>>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>>
>>>>> Finally we have ElementFilter, which drops everything that is not
>>>>> required (as per 4, 5).
>>>>>
>>>>>
>>>>> I believe the code could be improved by:
>>>>>
>>>>> * separating the library-based filtering from the option-driven
>>>>> filtering. These are two separate mechanism (the latter being
>>>>> optional
>>>>> on -L being used at all). Of course we should also decide which is
>>>>> used first.
>>>>>
>>>>> * bring together DependencyFinder and ElementFilter into
>>>>> DependencyFilter. You can have a two pass visitor - collect deps on
>>>>> a
>>>>> first pass, then prune on the second pass. This way the state
>>>>> remains
>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>>> state between two passes.
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>>> 4.) For the remaining trees in the root headers do dependency
>>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>>> 5.) For structs, enums, and typedefs include them if they appear
>>>>>> in a root header OR are required as dependency.
>>>>>>
>>>>>> Cheers,
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>>
>>>>>>>> I think this is a good idea! If you don't think this adds to
>>>>>>>> much complexity to the options?
>>>>>>>>
>>>>>>>> So we'd basically have:
>>>>>>>>
>>>>>>>> Without --include/exclude-header -> same behavior we have
>>>>>>>> right now.
>>>>>>>> With --include/exclude-header -> Add included headers to root
>>>>>>>> set + find any dependencies, and put both in the generated
>>>>>>>> artifact.
>>>>>>>>
>>>>>>>> I like that it does no filtering by default, so ad-hoc users
>>>>>>>> don't have to figure out which header files define which
>>>>>>>> functions.
>>>>>>>
>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and
>>>>>>> it
>>>>>>> adds the filter option that makes most sense when you want to
>>>>>>> chop
>>>>>>> down an API for good. This is a filtering mechanism after all,
>>>>>>> the
>>>>>>> fact that it's coming back as a filtering option is IMHO, a plus,
>>>>>>> no a
>>>>>>> minus.
>>>>>>>
>>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>>> argument of include/exclude regexes too (as for others). This way
>>>>>>> it
>>>>>>> can be even easier to use, most headers do have some
>>>>>>> commonalities in
>>>>>>> them - and I like that, with that, we can basically do the
>>>>>>> path-based
>>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>>
>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>>> I've implemented this, and now doing a clean build based on
>>>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>>>>
>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>>>>>> pre-processor code defined in a parent header. For instance,
>>>>>>>>>>> Python.h defines some pre-processor code used in pythonrun.h
>>>>>>>>>>> . The example we have only uses functions defined in
>>>>>>>>>>> pythonrun.h, but if we just pass that header to jextract
>>>>>>>>>>> Clang throws an error because of the missing pre-processor
>>>>>>>>>>> code from Python.h
>>>>>>>>>>>
>>>>>>>>>>> In this case we have to pass Python.h first, and then
>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on
>>>>>>>>>>> the existence of header include guards in the pythonrun.h
>>>>>>>>>>> header (since we're basically including it twice). A similar
>>>>>>>>>>> caveat exists with the Windows API.
>>>>>>>>>>
>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>>
>>>>>>>>>> What if the root set was a 'first class' concept in the
>>>>>>>>>> extraction run, rather than something we infer from this or
>>>>>>>>>> than command line option?
>>>>>>>>>>
>>>>>>>>>> That way we could point jextract at python.h, but then say
>>>>>>>>>> "hey, I'm only really interested at stuff that comes from
>>>>>>>>>> pythonrun.h".
>>>>>>>>>
>>>>>>>>> In other words, right now we have
>>>>>>>>>
>>>>>>>>> --include-symbol
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>> --exclude-symbol
>>>>>>>>>
>>>>>>>>> If not set, everything is included.
>>>>>>>>>
>>>>>>>>> Maybe all we need is:
>>>>>>>>>
>>>>>>>>> --include-header <header>
>>>>>>>>>
>>>>>>>>> --exclude-header <header>
>>>>>>>>>
>>>>>>>>> And, again, if none is specified, all headers are part of the
>>>>>>>>> 'root set'.
>>>>>>>>>
>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Maurizio
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, the guide-line is: Pass the main header first, then
>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which
>>>>>>>>>>> includes a_impl.h, and another main header B.h which includes
>>>>>>>>>>> b_impl.h, headers should be passed to jextract in the order:
>>>>>>>>>>> A.h a_impl.h B.h b_impl.h
>>>>>>>>>>>
>>>>>>>>>>> I think the behavior being dependent on the ordering of the
>>>>>>>>>>> headers could be fixed by sorting the headers in topological
>>>>>>>>>>> order. But, there's still the requirement to pass all of
>>>>>>>>>>> them, or things break.
>>>>>>>>>>>
>>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>>>>>> jextract
>>>>>>>>>>>> (except for simple examples). I find it very useful to be
>>>>>>>>>>>> able to
>>>>>>>>>>>> split the command over multiple lines, especially long file
>>>>>>>>>>>> paths
>>>>>>>>>>>> become much more readable.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>>> I still like this approach, and I think adding support for
>>>>>>>>>>>>>> wildcard patterns and/or header filters would make it
>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract
>>>>>>>>>>>>>> is what you get. Though, we could still apply dependency
>>>>>>>>>>>>>> analysis, which would make sure nothing that's needed gets
>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>>
>>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>>
>>>>>>>>>>>>> which works
>>>>>>>>>>>>>
>>>>>>>>>>>>> also, note that jextract also accepts a file with the @
>>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>>
>>>>>>>>>>>>> where args.txt is the command line (which could list all
>>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list