[foreign] RFR 8220544: Jextract; library dependent header filtering

Tue Mar 19 13:50:35 UTC 2019

Suggestions sound good.

Here's the updated webrev: 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/

* Split library based filtering code off from SymbolFilter into 
LibraryLookupFilter.
* Merged DependencyFinder & ElementFilter into DependencyFilter (as 
nested classes).
* Also added 2 more test cases to check that symbols being filtered out 
don't have their dependencies added.

Jorn

Maurizio Cimadamore schreef op 2019-03-19 13:40:
> Looks good, I did some tests and it seems to behave according to the
> expectations - some comments below:
> 
> On 18/03/2019 10:58, Jorn Vernee wrote:
>> Hi,
>> 
>> I have implemented this: 
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>> 
>> This adds 2 options --include-headers and --exclude-headers that take 
>> regular expression patterns as an argument. The patterns are matched 
>> against a cursor's full header file path to determine if it appears in 
>> the header root set.
>> 
>> For example, I can use the following to include headers in the Python 
>> include director;
>> 
>>     --include-headers "J:\\Python27\\include\\.*"
>> 
>> ---
>> 
>> To recap, we now:
>>   1.) Filter functions, vars, and macros by whether they appear in a 
>> root header.
>>   2.) Filter functions, vars, and macros by the 
>> --includes/exclude-symbols patterns.
>>   3.) Filter functions and vars, by whether they appear in any shared 
>> library.
> 
> I think this is ok, but the way this happens in the code seems to be
> sort of ad hoc, and probably the result of how the code use to be in
> previous rounds.
> 
> For instance, SymbolFilter does both (1), (2) and (3).
> 
> Then we have DependencyFinder which also does a bit of (1).
> 
> Finally we have ElementFilter, which drops everything that is not
> required (as per 4, 5).
> 
> 
> I believe the code could be improved by:
> 
> * separating the library-based filtering from the option-driven
> filtering. These are two separate mechanism (the latter being optional
> on -L being used at all). Of course we should also decide which is
> used first.
> 
> * bring together DependencyFinder and ElementFilter into
> DependencyFilter. You can have a two pass visitor - collect deps on a
> first pass, then prune on the second pass. This way the state remains
> 'internal' to the DependencyFilter visitor, no need to pass magic
> state between two passes.
> 
> Maurizio
> 
> 
>>   4.) For the remaining trees in the root headers do dependency 
>> analysis, and find the required structs, enums, and typedefs.
>>   5.) For structs, enums, and typedefs include them if they appear in 
>> a root header OR are required as dependency.
>> 
>> Cheers,
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>> Heh - was writing an email pretty much with this suggestion :P
>>>> 
>>>> I think this is a good idea! If you don't think this adds to much 
>>>> complexity to the options?
>>>> 
>>>> So we'd basically have:
>>>> 
>>>>   Without --include/exclude-header -> same behavior we have right 
>>>> now.
>>>>   With --include/exclude-header -> Add included headers to root set 
>>>> + find any dependencies, and put both in the generated artifact.
>>>> 
>>>> I like that it does no filtering by default, so ad-hoc users don't 
>>>> have to figure out which header files define which functions.
>>> 
>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>> adds the filter option that makes most sense when you want to chop
>>> down an API for good. This is a filtering mechanism after all, the
>>> fact that it's coming back as a filtering option is IMHO, a plus, no 
>>> a
>>> minus.
>>> 
>>> Btw, slight revision to what I said previously - let's make the
>>> argument of include/exclude regexes too (as for others). This way it
>>> can be even easier to use, most headers do have some commonalities in
>>> them - and I like that, with that,  we can basically do the 
>>> path-based
>>> heuristics w/o hardwiring that into jextract.
>>> 
>>> Maurizio
>>> 
>>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>> 
>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>> I've implemented this, and now doing a clean build based on the 
>>>>>>> latest jdk/jdk merge before submitting the next webrev.
>>>>>>> 
>>>>>>> It's working nicely, with one caveat; Some headers rely on 
>>>>>>> pre-processor code defined in a parent header. For instance, 
>>>>>>> Python.h defines some pre-processor code used in pythonrun.h . 
>>>>>>> The example we have only uses functions defined in pythonrun.h, 
>>>>>>> but if we just pass that header to jextract Clang throws an error 
>>>>>>> because of the missing pre-processor code from Python.h
>>>>>>> 
>>>>>>> In this case we have to pass Python.h first, and then pythonrun.h 
>>>>>>> to get everything to work. This also relies on the existence of 
>>>>>>> header include guards in the pythonrun.h header (since we're 
>>>>>>> basically including it twice). A similar caveat exists with the 
>>>>>>> Windows API.
>>>>>> 
>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>> 
>>>>>> What if the root set was a 'first class' concept in the extraction 
>>>>>> run, rather than something we infer from this or than command line 
>>>>>> option?
>>>>>> 
>>>>>> That way we could point jextract at python.h, but then say "hey, 
>>>>>> I'm only really interested at stuff that comes from pythonrun.h".
>>>>> 
>>>>> In other words, right now we have
>>>>> 
>>>>> --include-symbol
>>>>> 
>>>>> and
>>>>> 
>>>>> --exclude-symbol
>>>>> 
>>>>> If not set, everything is included.
>>>>> 
>>>>> Maybe all we need is:
>>>>> 
>>>>> --include-header <header>
>>>>> 
>>>>> --exclude-header <header>
>>>>> 
>>>>> And, again, if none is specified, all headers are part of the 'root 
>>>>> set'.
>>>>> 
>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>> 'include path' for header names in --include/exclude-header
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>>> 
>>>>>> Maurizio
>>>>>> 
>>>>>>> 
>>>>>>> So, the guide-line is: Pass the main header first, then internal 
>>>>>>> headers. e.g. If we have a main header A.h which includes 
>>>>>>> a_impl.h, and another main header B.h which includes b_impl.h, 
>>>>>>> headers should be passed to jextract in the order: A.h a_impl.h 
>>>>>>> B.h b_impl.h
>>>>>>> 
>>>>>>> I think the behavior being dependent on the ordering of the 
>>>>>>> headers could be fixed by sorting the headers in topological 
>>>>>>> order. But, there's still the requirement to pass all of them, or 
>>>>>>> things break.
>>>>>>> 
>>>>>>> Any thoughs about that?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jorn
>>>>>>> 
>>>>>>> 
>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>> I've already been using shell scripts mostly when running 
>>>>>>>> jextract
>>>>>>>> (except for simple examples). I find it very useful to be able 
>>>>>>>> to
>>>>>>>> split the command over multiple lines, especially long file 
>>>>>>>> paths
>>>>>>>> become much more readable.
>>>>>>>> 
>>>>>>>> I'll start working on this then.
>>>>>>>> 
>>>>>>>> Jorn
>>>>>>>> 
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>> I still like this approach, and I think adding support for 
>>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>> 
>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract is 
>>>>>>>>>> what you get. Though, we could still apply dependency 
>>>>>>>>>> analysis, which would make sure nothing that's needed gets 
>>>>>>>>>> dropped.
>>>>>>>>> 
>>>>>>>>> Why don't we try this then?
>>>>>>>>> 
>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>> 
>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>> 
>>>>>>>>> which works
>>>>>>>>> 
>>>>>>>>> also, note that jextract also accepts a file with the @ syntax, 
>>>>>>>>> etc.
>>>>>>>>> 
>>>>>>>>> jextract @args.txt
>>>>>>>>> 
>>>>>>>>> where args.txt is the command line (which could list all the 
>>>>>>>>> headers you want!).
>>>>>>>>> 
>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>> 
>>>>>>>>> Maurizio