[foreign] RFR 8220544: Jextract; library dependent header filtering
Sundararajan Athijegannathan
sundararajan.athijegannathan at oracle.com
Tue Mar 19 14:16:26 UTC 2019
* 47 * A merge between {@link DependencyFinder} and {@link ElementFilter}
in DependencyFilter. Could be removed?
No comments other than that. +1
I'm going to submit an internal mach5 build/request for this patch. I'll
notify you as soon as it is all green.
-Sundar
On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
> This looks good to me, I have no further comments.
>
> Great work!
>
> (I'll leave it to Sundar for further comments/suggestions)
>
> Maurizio
>
> On 19/03/2019 13:50, Jorn Vernee wrote:
>> Suggestions sound good.
>>
>> Here's the updated webrev:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>
>> * Split library based filtering code off from SymbolFilter into
>> LibraryLookupFilter.
>> * Merged DependencyFinder & ElementFilter into DependencyFilter (as
>> nested classes).
>> * Also added 2 more test cases to check that symbols being filtered
>> out don't have their dependencies added.
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>> Looks good, I did some tests and it seems to behave according to the
>>> expectations - some comments below:
>>>
>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>> Hi,
>>>>
>>>> I have implemented this:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>
>>>> This adds 2 options --include-headers and --exclude-headers that
>>>> take regular expression patterns as an argument. The patterns are
>>>> matched against a cursor's full header file path to determine if it
>>>> appears in the header root set.
>>>>
>>>> For example, I can use the following to include headers in the
>>>> Python include director;
>>>>
>>>> --include-headers "J:\\Python27\\include\\.*"
>>>>
>>>> ---
>>>>
>>>> To recap, we now:
>>>> 1.) Filter functions, vars, and macros by whether they appear in
>>>> a root header.
>>>> 2.) Filter functions, vars, and macros by the
>>>> --includes/exclude-symbols patterns.
>>>> 3.) Filter functions and vars, by whether they appear in any
>>>> shared library.
>>>
>>> I think this is ok, but the way this happens in the code seems to be
>>> sort of ad hoc, and probably the result of how the code use to be in
>>> previous rounds.
>>>
>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>
>>> Then we have DependencyFinder which also does a bit of (1).
>>>
>>> Finally we have ElementFilter, which drops everything that is not
>>> required (as per 4, 5).
>>>
>>>
>>> I believe the code could be improved by:
>>>
>>> * separating the library-based filtering from the option-driven
>>> filtering. These are two separate mechanism (the latter being optional
>>> on -L being used at all). Of course we should also decide which is
>>> used first.
>>>
>>> * bring together DependencyFinder and ElementFilter into
>>> DependencyFilter. You can have a two pass visitor - collect deps on a
>>> first pass, then prune on the second pass. This way the state remains
>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>> state between two passes.
>>>
>>> Maurizio
>>>
>>>
>>>> 4.) For the remaining trees in the root headers do dependency
>>>> analysis, and find the required structs, enums, and typedefs.
>>>> 5.) For structs, enums, and typedefs include them if they appear
>>>> in a root header OR are required as dependency.
>>>>
>>>> Cheers,
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>
>>>>>> I think this is a good idea! If you don't think this adds to much
>>>>>> complexity to the options?
>>>>>>
>>>>>> So we'd basically have:
>>>>>>
>>>>>> Without --include/exclude-header -> same behavior we have right
>>>>>> now.
>>>>>> With --include/exclude-header -> Add included headers to root
>>>>>> set + find any dependencies, and put both in the generated artifact.
>>>>>>
>>>>>> I like that it does no filtering by default, so ad-hoc users
>>>>>> don't have to figure out which header files define which functions.
>>>>>
>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>>>> adds the filter option that makes most sense when you want to chop
>>>>> down an API for good. This is a filtering mechanism after all, the
>>>>> fact that it's coming back as a filtering option is IMHO, a plus,
>>>>> no a
>>>>> minus.
>>>>>
>>>>> Btw, slight revision to what I said previously - let's make the
>>>>> argument of include/exclude regexes too (as for others). This way it
>>>>> can be even easier to use, most headers do have some commonalities in
>>>>> them - and I like that, with that, we can basically do the
>>>>> path-based
>>>>> heuristics w/o hardwiring that into jextract.
>>>>>
>>>>> Maurizio
>>>>>
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>
>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>> I've implemented this, and now doing a clean build based on
>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>>
>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>>>> pre-processor code defined in a parent header. For instance,
>>>>>>>>> Python.h defines some pre-processor code used in pythonrun.h .
>>>>>>>>> The example we have only uses functions defined in
>>>>>>>>> pythonrun.h, but if we just pass that header to jextract Clang
>>>>>>>>> throws an error because of the missing pre-processor code from
>>>>>>>>> Python.h
>>>>>>>>>
>>>>>>>>> In this case we have to pass Python.h first, and then
>>>>>>>>> pythonrun.h to get everything to work. This also relies on the
>>>>>>>>> existence of header include guards in the pythonrun.h header
>>>>>>>>> (since we're basically including it twice). A similar caveat
>>>>>>>>> exists with the Windows API.
>>>>>>>>
>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>
>>>>>>>> What if the root set was a 'first class' concept in the
>>>>>>>> extraction run, rather than something we infer from this or
>>>>>>>> than command line option?
>>>>>>>>
>>>>>>>> That way we could point jextract at python.h, but then say
>>>>>>>> "hey, I'm only really interested at stuff that comes from
>>>>>>>> pythonrun.h".
>>>>>>>
>>>>>>> In other words, right now we have
>>>>>>>
>>>>>>> --include-symbol
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> --exclude-symbol
>>>>>>>
>>>>>>> If not set, everything is included.
>>>>>>>
>>>>>>> Maybe all we need is:
>>>>>>>
>>>>>>> --include-header <header>
>>>>>>>
>>>>>>> --exclude-header <header>
>>>>>>>
>>>>>>> And, again, if none is specified, all headers are part of the
>>>>>>> 'root set'.
>>>>>>>
>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>>
>>>>>>>>> So, the guide-line is: Pass the main header first, then
>>>>>>>>> internal headers. e.g. If we have a main header A.h which
>>>>>>>>> includes a_impl.h, and another main header B.h which includes
>>>>>>>>> b_impl.h, headers should be passed to jextract in the order:
>>>>>>>>> A.h a_impl.h B.h b_impl.h
>>>>>>>>>
>>>>>>>>> I think the behavior being dependent on the ordering of the
>>>>>>>>> headers could be fixed by sorting the headers in topological
>>>>>>>>> order. But, there's still the requirement to pass all of them,
>>>>>>>>> or things break.
>>>>>>>>>
>>>>>>>>> Any thoughs about that?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>>>> jextract
>>>>>>>>>> (except for simple examples). I find it very useful to be
>>>>>>>>>> able to
>>>>>>>>>> split the command over multiple lines, especially long file
>>>>>>>>>> paths
>>>>>>>>>> become much more readable.
>>>>>>>>>>
>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>> I still like this approach, and I think adding support for
>>>>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>>>>
>>>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract
>>>>>>>>>>>> is what you get. Though, we could still apply dependency
>>>>>>>>>>>> analysis, which would make sure nothing that's needed gets
>>>>>>>>>>>> dropped.
>>>>>>>>>>>
>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>
>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>
>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>
>>>>>>>>>>> which works
>>>>>>>>>>>
>>>>>>>>>>> also, note that jextract also accepts a file with the @
>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>
>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>
>>>>>>>>>>> where args.txt is the command line (which could list all the
>>>>>>>>>>> headers you want!).
>>>>>>>>>>>
>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>
>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list