[foreign] RFR 8220544: Jextract; library dependent header filtering

Jorn Vernee jbvernee at xs4all.nl
Tue Mar 19 19:58:31 UTC 2019


Thanks for the reviews. I've updated the comment and pushed.

Jorn

Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
> Tests passed. Please go ahead with push.
> 
> -Sundar
> 
> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
>> * 47 * A merge between {@link DependencyFinder} and {@link 
>> ElementFilter}
>> 
>> in DependencyFilter. Could be removed?
>> 
>> No comments other than that. +1
>> 
>> I'm going to submit an internal mach5 build/request for this patch. 
>> I'll notify you as soon as it is all green.
>> 
>> -Sundar
>> 
>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>>> This looks good to me, I have no further comments.
>>> 
>>> Great work!
>>> 
>>> (I'll leave it to Sundar for further comments/suggestions)
>>> 
>>> Maurizio
>>> 
>>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>>> Suggestions sound good.
>>>> 
>>>> Here's the updated webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>>> 
>>>> * Split library based filtering code off from SymbolFilter into 
>>>> LibraryLookupFilter.
>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter (as 
>>>> nested classes).
>>>> * Also added 2 more test cases to check that symbols being filtered 
>>>> out don't have their dependencies added.
>>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>>> Looks good, I did some tests and it seems to behave according to 
>>>>> the
>>>>> expectations - some comments below:
>>>>> 
>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I have implemented this: 
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>>> 
>>>>>> This adds 2 options --include-headers and --exclude-headers that 
>>>>>> take regular expression patterns as an argument. The patterns are 
>>>>>> matched against a cursor's full header file path to determine if 
>>>>>> it appears in the header root set.
>>>>>> 
>>>>>> For example, I can use the following to include headers in the 
>>>>>> Python include director;
>>>>>> 
>>>>>>     --include-headers "J:\\Python27\\include\\.*"
>>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> To recap, we now:
>>>>>>   1.) Filter functions, vars, and macros by whether they appear in 
>>>>>> a root header.
>>>>>>   2.) Filter functions, vars, and macros by the 
>>>>>> --includes/exclude-symbols patterns.
>>>>>>   3.) Filter functions and vars, by whether they appear in any 
>>>>>> shared library.
>>>>> 
>>>>> I think this is ok, but the way this happens in the code seems to 
>>>>> be
>>>>> sort of ad hoc, and probably the result of how the code use to be 
>>>>> in
>>>>> previous rounds.
>>>>> 
>>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>> 
>>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>> 
>>>>> Finally we have ElementFilter, which drops everything that is not
>>>>> required (as per 4, 5).
>>>>> 
>>>>> 
>>>>> I believe the code could be improved by:
>>>>> 
>>>>> * separating the library-based filtering from the option-driven
>>>>> filtering. These are two separate mechanism (the latter being 
>>>>> optional
>>>>> on -L being used at all). Of course we should also decide which is
>>>>> used first.
>>>>> 
>>>>> * bring together DependencyFinder and ElementFilter into
>>>>> DependencyFilter. You can have a two pass visitor - collect deps on 
>>>>> a
>>>>> first pass, then prune on the second pass. This way the state 
>>>>> remains
>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>>> state between two passes.
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>> 
>>>>>>   4.) For the remaining trees in the root headers do dependency 
>>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>>>   5.) For structs, enums, and typedefs include them if they appear 
>>>>>> in a root header OR are required as dependency.
>>>>>> 
>>>>>> Cheers,
>>>>>> Jorn
>>>>>> 
>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>> 
>>>>>>>> I think this is a good idea! If you don't think this adds to 
>>>>>>>> much complexity to the options?
>>>>>>>> 
>>>>>>>> So we'd basically have:
>>>>>>>> 
>>>>>>>>   Without --include/exclude-header -> same behavior we have 
>>>>>>>> right now.
>>>>>>>>   With --include/exclude-header -> Add included headers to root 
>>>>>>>> set + find any dependencies, and put both in the generated 
>>>>>>>> artifact.
>>>>>>>> 
>>>>>>>> I like that it does no filtering by default, so ad-hoc users 
>>>>>>>> don't have to figure out which header files define which 
>>>>>>>> functions.
>>>>>>> 
>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and 
>>>>>>> it
>>>>>>> adds the filter option that makes most sense when you want to 
>>>>>>> chop
>>>>>>> down an API for good. This is a filtering mechanism after all, 
>>>>>>> the
>>>>>>> fact that it's coming back as a filtering option is IMHO, a plus, 
>>>>>>> no a
>>>>>>> minus.
>>>>>>> 
>>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>>> argument of include/exclude regexes too (as for others). This way 
>>>>>>> it
>>>>>>> can be even easier to use, most headers do have some 
>>>>>>> commonalities in
>>>>>>> them - and I like that, with that,  we can basically do the 
>>>>>>> path-based
>>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>> 
>>>>>>> Maurizio
>>>>>>> 
>>>>>>>> 
>>>>>>>> Jorn
>>>>>>>> 
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>> 
>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>>> I've implemented this, and now doing a clean build based on 
>>>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>>>> 
>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on 
>>>>>>>>>>> pre-processor code defined in a parent header. For instance, 
>>>>>>>>>>> Python.h defines some pre-processor code used in pythonrun.h 
>>>>>>>>>>> . The example we have only uses functions defined in 
>>>>>>>>>>> pythonrun.h, but if we just pass that header to jextract 
>>>>>>>>>>> Clang throws an error because of the missing pre-processor 
>>>>>>>>>>> code from Python.h
>>>>>>>>>>> 
>>>>>>>>>>> In this case we have to pass Python.h first, and then 
>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on 
>>>>>>>>>>> the existence of header include guards in the pythonrun.h 
>>>>>>>>>>> header (since we're basically including it twice). A similar 
>>>>>>>>>>> caveat exists with the Windows API.
>>>>>>>>>> 
>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>> 
>>>>>>>>>> What if the root set was a 'first class' concept in the 
>>>>>>>>>> extraction run, rather than something we infer from this or 
>>>>>>>>>> than command line option?
>>>>>>>>>> 
>>>>>>>>>> That way we could point jextract at python.h, but then say 
>>>>>>>>>> "hey, I'm only really interested at stuff that comes from 
>>>>>>>>>> pythonrun.h".
>>>>>>>>> 
>>>>>>>>> In other words, right now we have
>>>>>>>>> 
>>>>>>>>> --include-symbol
>>>>>>>>> 
>>>>>>>>> and
>>>>>>>>> 
>>>>>>>>> --exclude-symbol
>>>>>>>>> 
>>>>>>>>> If not set, everything is included.
>>>>>>>>> 
>>>>>>>>> Maybe all we need is:
>>>>>>>>> 
>>>>>>>>> --include-header <header>
>>>>>>>>> 
>>>>>>>>> --exclude-header <header>
>>>>>>>>> 
>>>>>>>>> And, again, if none is specified, all headers are part of the 
>>>>>>>>> 'root set'.
>>>>>>>>> 
>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>> 
>>>>>>>>> Maurizio
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Maurizio
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> So, the guide-line is: Pass the main header first, then 
>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which 
>>>>>>>>>>> includes a_impl.h, and another main header B.h which includes 
>>>>>>>>>>> b_impl.h, headers should be passed to jextract in the order: 
>>>>>>>>>>> A.h a_impl.h B.h b_impl.h
>>>>>>>>>>> 
>>>>>>>>>>> I think the behavior being dependent on the ordering of the 
>>>>>>>>>>> headers could be fixed by sorting the headers in topological 
>>>>>>>>>>> order. But, there's still the requirement to pass all of 
>>>>>>>>>>> them, or things break.
>>>>>>>>>>> 
>>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorn
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>>> I've already been using shell scripts mostly when running 
>>>>>>>>>>>> jextract
>>>>>>>>>>>> (except for simple examples). I find it very useful to be 
>>>>>>>>>>>> able to
>>>>>>>>>>>> split the command over multiple lines, especially long file 
>>>>>>>>>>>> paths
>>>>>>>>>>>> become much more readable.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jorn
>>>>>>>>>>>> 
>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>>> I still like this approach, and I think adding support for 
>>>>>>>>>>>>>> wildcard patterns and/or header filters would make it 
>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract 
>>>>>>>>>>>>>> is what you get. Though, we could still apply dependency 
>>>>>>>>>>>>>> analysis, which would make sure nothing that's needed gets 
>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> which works
>>>>>>>>>>>>> 
>>>>>>>>>>>>> also, note that jextract also accepts a file with the @ 
>>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>> 
>>>>>>>>>>>>> where args.txt is the command line (which could list all 
>>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maurizio


More information about the panama-dev mailing list