[foreign] RFR 8220544: Jextract; library dependent header filtering
Sundararajan Athijegannathan
sundararajan.athijegannathan at oracle.com
Tue Mar 19 17:07:01 UTC 2019
Tests passed. Please go ahead with push.
-Sundar
On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
> * 47 * A merge between {@link DependencyFinder} and {@link ElementFilter}
>
> in DependencyFilter. Could be removed?
>
> No comments other than that. +1
>
> I'm going to submit an internal mach5 build/request for this patch.
> I'll notify you as soon as it is all green.
>
> -Sundar
>
> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>> This looks good to me, I have no further comments.
>>
>> Great work!
>>
>> (I'll leave it to Sundar for further comments/suggestions)
>>
>> Maurizio
>>
>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>> Suggestions sound good.
>>>
>>> Here's the updated webrev:
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>>
>>> * Split library based filtering code off from SymbolFilter into
>>> LibraryLookupFilter.
>>> * Merged DependencyFinder & ElementFilter into DependencyFilter (as
>>> nested classes).
>>> * Also added 2 more test cases to check that symbols being filtered
>>> out don't have their dependencies added.
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>> Looks good, I did some tests and it seems to behave according to the
>>>> expectations - some comments below:
>>>>
>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>> Hi,
>>>>>
>>>>> I have implemented this:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>>
>>>>> This adds 2 options --include-headers and --exclude-headers that
>>>>> take regular expression patterns as an argument. The patterns are
>>>>> matched against a cursor's full header file path to determine if
>>>>> it appears in the header root set.
>>>>>
>>>>> For example, I can use the following to include headers in the
>>>>> Python include director;
>>>>>
>>>>> --include-headers "J:\\Python27\\include\\.*"
>>>>>
>>>>> ---
>>>>>
>>>>> To recap, we now:
>>>>> 1.) Filter functions, vars, and macros by whether they appear in
>>>>> a root header.
>>>>> 2.) Filter functions, vars, and macros by the
>>>>> --includes/exclude-symbols patterns.
>>>>> 3.) Filter functions and vars, by whether they appear in any
>>>>> shared library.
>>>>
>>>> I think this is ok, but the way this happens in the code seems to be
>>>> sort of ad hoc, and probably the result of how the code use to be in
>>>> previous rounds.
>>>>
>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>
>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>
>>>> Finally we have ElementFilter, which drops everything that is not
>>>> required (as per 4, 5).
>>>>
>>>>
>>>> I believe the code could be improved by:
>>>>
>>>> * separating the library-based filtering from the option-driven
>>>> filtering. These are two separate mechanism (the latter being optional
>>>> on -L being used at all). Of course we should also decide which is
>>>> used first.
>>>>
>>>> * bring together DependencyFinder and ElementFilter into
>>>> DependencyFilter. You can have a two pass visitor - collect deps on a
>>>> first pass, then prune on the second pass. This way the state remains
>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>> state between two passes.
>>>>
>>>> Maurizio
>>>>
>>>>
>>>>> 4.) For the remaining trees in the root headers do dependency
>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>> 5.) For structs, enums, and typedefs include them if they appear
>>>>> in a root header OR are required as dependency.
>>>>>
>>>>> Cheers,
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>
>>>>>>> I think this is a good idea! If you don't think this adds to
>>>>>>> much complexity to the options?
>>>>>>>
>>>>>>> So we'd basically have:
>>>>>>>
>>>>>>> Without --include/exclude-header -> same behavior we have
>>>>>>> right now.
>>>>>>> With --include/exclude-header -> Add included headers to root
>>>>>>> set + find any dependencies, and put both in the generated
>>>>>>> artifact.
>>>>>>>
>>>>>>> I like that it does no filtering by default, so ad-hoc users
>>>>>>> don't have to figure out which header files define which functions.
>>>>>>
>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>>>>> adds the filter option that makes most sense when you want to chop
>>>>>> down an API for good. This is a filtering mechanism after all, the
>>>>>> fact that it's coming back as a filtering option is IMHO, a plus,
>>>>>> no a
>>>>>> minus.
>>>>>>
>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>> argument of include/exclude regexes too (as for others). This way it
>>>>>> can be even easier to use, most headers do have some
>>>>>> commonalities in
>>>>>> them - and I like that, with that, we can basically do the
>>>>>> path-based
>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>>
>>>>>>> Jorn
>>>>>>>
>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>
>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>> I've implemented this, and now doing a clean build based on
>>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>>>
>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>>>>> pre-processor code defined in a parent header. For instance,
>>>>>>>>>> Python.h defines some pre-processor code used in pythonrun.h
>>>>>>>>>> . The example we have only uses functions defined in
>>>>>>>>>> pythonrun.h, but if we just pass that header to jextract
>>>>>>>>>> Clang throws an error because of the missing pre-processor
>>>>>>>>>> code from Python.h
>>>>>>>>>>
>>>>>>>>>> In this case we have to pass Python.h first, and then
>>>>>>>>>> pythonrun.h to get everything to work. This also relies on
>>>>>>>>>> the existence of header include guards in the pythonrun.h
>>>>>>>>>> header (since we're basically including it twice). A similar
>>>>>>>>>> caveat exists with the Windows API.
>>>>>>>>>
>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>
>>>>>>>>> What if the root set was a 'first class' concept in the
>>>>>>>>> extraction run, rather than something we infer from this or
>>>>>>>>> than command line option?
>>>>>>>>>
>>>>>>>>> That way we could point jextract at python.h, but then say
>>>>>>>>> "hey, I'm only really interested at stuff that comes from
>>>>>>>>> pythonrun.h".
>>>>>>>>
>>>>>>>> In other words, right now we have
>>>>>>>>
>>>>>>>> --include-symbol
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>> --exclude-symbol
>>>>>>>>
>>>>>>>> If not set, everything is included.
>>>>>>>>
>>>>>>>> Maybe all we need is:
>>>>>>>>
>>>>>>>> --include-header <header>
>>>>>>>>
>>>>>>>> --exclude-header <header>
>>>>>>>>
>>>>>>>> And, again, if none is specified, all headers are part of the
>>>>>>>> 'root set'.
>>>>>>>>
>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So, the guide-line is: Pass the main header first, then
>>>>>>>>>> internal headers. e.g. If we have a main header A.h which
>>>>>>>>>> includes a_impl.h, and another main header B.h which includes
>>>>>>>>>> b_impl.h, headers should be passed to jextract in the order:
>>>>>>>>>> A.h a_impl.h B.h b_impl.h
>>>>>>>>>>
>>>>>>>>>> I think the behavior being dependent on the ordering of the
>>>>>>>>>> headers could be fixed by sorting the headers in topological
>>>>>>>>>> order. But, there's still the requirement to pass all of
>>>>>>>>>> them, or things break.
>>>>>>>>>>
>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>>>>> jextract
>>>>>>>>>>> (except for simple examples). I find it very useful to be
>>>>>>>>>>> able to
>>>>>>>>>>> split the command over multiple lines, especially long file
>>>>>>>>>>> paths
>>>>>>>>>>> become much more readable.
>>>>>>>>>>>
>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>> I still like this approach, and I think adding support for
>>>>>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract
>>>>>>>>>>>>> is what you get. Though, we could still apply dependency
>>>>>>>>>>>>> analysis, which would make sure nothing that's needed gets
>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>
>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>
>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>
>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>
>>>>>>>>>>>> which works
>>>>>>>>>>>>
>>>>>>>>>>>> also, note that jextract also accepts a file with the @
>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>
>>>>>>>>>>>> where args.txt is the command line (which could list all
>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list