[foreign] RFR 8220544: Jextract; library dependent header filtering

Tue Mar 19 12:40:51 UTC 2019

Looks good, I did some tests and it seems to behave according to the 
expectations - some comments below:

On 18/03/2019 10:58, Jorn Vernee wrote:
> Hi,
>
> I have implemented this: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>
> This adds 2 options --include-headers and --exclude-headers that take 
> regular expression patterns as an argument. The patterns are matched 
> against a cursor's full header file path to determine if it appears in 
> the header root set.
>
> For example, I can use the following to include headers in the Python 
> include director;
>
>     --include-headers "J:\\Python27\\include\\.*"
>
> ---
>
> To recap, we now:
>   1.) Filter functions, vars, and macros by whether they appear in a 
> root header.
>   2.) Filter functions, vars, and macros by the 
> --includes/exclude-symbols patterns.
>   3.) Filter functions and vars, by whether they appear in any shared 
> library.

I think this is ok, but the way this happens in the code seems to be 
sort of ad hoc, and probably the result of how the code use to be in 
previous rounds.

For instance, SymbolFilter does both (1), (2) and (3).

Then we have DependencyFinder which also does a bit of (1).

Finally we have ElementFilter, which drops everything that is not 
required (as per 4, 5).

I believe the code could be improved by:

* separating the library-based filtering from the option-driven 
filtering. These are two separate mechanism (the latter being optional 
on -L being used at all). Of course we should also decide which is used 
first.

* bring together DependencyFinder and ElementFilter into 
DependencyFilter. You can have a two pass visitor - collect deps on a 
first pass, then prune on the second pass. This way the state remains 
'internal' to the DependencyFilter visitor, no need to pass magic state 
between two passes.

Maurizio

>   4.) For the remaining trees in the root headers do dependency 
> analysis, and find the required structs, enums, and typedefs.
>   5.) For structs, enums, and typedefs include them if they appear in 
> a root header OR are required as dependency.
>
> Cheers,
> Jorn
>
> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>> Heh - was writing an email pretty much with this suggestion :P
>>>
>>> I think this is a good idea! If you don't think this adds to much 
>>> complexity to the options?
>>>
>>> So we'd basically have:
>>>
>>>   Without --include/exclude-header -> same behavior we have right now.
>>>   With --include/exclude-header -> Add included headers to root set 
>>> + find any dependencies, and put both in the generated artifact.
>>>
>>> I like that it does no filtering by default, so ad-hoc users don't 
>>> have to figure out which header files define which functions.
>>
>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>> adds the filter option that makes most sense when you want to chop
>> down an API for good. This is a filtering mechanism after all, the
>> fact that it's coming back as a filtering option is IMHO, a plus, no a
>> minus.
>>
>> Btw, slight revision to what I said previously - let's make the
>> argument of include/exclude regexes too (as for others). This way it
>> can be even easier to use, most headers do have some commonalities in
>> them - and I like that, with that,  we can basically do the path-based
>> heuristics w/o hardwiring that into jextract.
>>
>> Maurizio
>>
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>> I've implemented this, and now doing a clean build based on the 
>>>>>> latest jdk/jdk merge before submitting the next webrev.
>>>>>>
>>>>>> It's working nicely, with one caveat; Some headers rely on 
>>>>>> pre-processor code defined in a parent header. For instance, 
>>>>>> Python.h defines some pre-processor code used in pythonrun.h . 
>>>>>> The example we have only uses functions defined in pythonrun.h, 
>>>>>> but if we just pass that header to jextract Clang throws an error 
>>>>>> because of the missing pre-processor code from Python.h
>>>>>>
>>>>>> In this case we have to pass Python.h first, and then pythonrun.h 
>>>>>> to get everything to work. This also relies on the existence of 
>>>>>> header include guards in the pythonrun.h header (since we're 
>>>>>> basically including it twice). A similar caveat exists with the 
>>>>>> Windows API.
>>>>>
>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>
>>>>> What if the root set was a 'first class' concept in the extraction 
>>>>> run, rather than something we infer from this or than command line 
>>>>> option?
>>>>>
>>>>> That way we could point jextract at python.h, but then say "hey, 
>>>>> I'm only really interested at stuff that comes from pythonrun.h".
>>>>
>>>> In other words, right now we have
>>>>
>>>> --include-symbol
>>>>
>>>> and
>>>>
>>>> --exclude-symbol
>>>>
>>>> If not set, everything is included.
>>>>
>>>> Maybe all we need is:
>>>>
>>>> --include-header <header>
>>>>
>>>> --exclude-header <header>
>>>>
>>>> And, again, if none is specified, all headers are part of the 'root 
>>>> set'.
>>>>
>>>> And, to address a concern you had - yes, I'd consider using the
>>>> 'include path' for header names in --include/exclude-header
>>>>
>>>> Maurizio
>>>>
>>>>>
>>>>> Maurizio
>>>>>
>>>>>>
>>>>>> So, the guide-line is: Pass the main header first, then internal 
>>>>>> headers. e.g. If we have a main header A.h which includes 
>>>>>> a_impl.h, and another main header B.h which includes b_impl.h, 
>>>>>> headers should be passed to jextract in the order: A.h a_impl.h 
>>>>>> B.h b_impl.h
>>>>>>
>>>>>> I think the behavior being dependent on the ordering of the 
>>>>>> headers could be fixed by sorting the headers in topological 
>>>>>> order. But, there's still the requirement to pass all of them, or 
>>>>>> things break.
>>>>>>
>>>>>> Any thoughs about that?
>>>>>>
>>>>>> Thanks,
>>>>>> Jorn
>>>>>>
>>>>>>
>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>> I've already been using shell scripts mostly when running jextract
>>>>>>> (except for simple examples). I find it very useful to be able to
>>>>>>> split the command over multiple lines, especially long file paths
>>>>>>> become much more readable.
>>>>>>>
>>>>>>> I'll start working on this then.
>>>>>>>
>>>>>>> Jorn
>>>>>>>
>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>> I still like this approach, and I think adding support for 
>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>
>>>>>>>>> Like you said: It's dead simple. What you pass to jextract is 
>>>>>>>>> what you get. Though, we could still apply dependency 
>>>>>>>>> analysis, which would make sure nothing that's needed gets 
>>>>>>>>> dropped.
>>>>>>>>
>>>>>>>> Why don't we try this then?
>>>>>>>>
>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>
>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>
>>>>>>>> which works
>>>>>>>>
>>>>>>>> also, note that jextract also accepts a file with the @ syntax, 
>>>>>>>> etc.
>>>>>>>>
>>>>>>>> jextract @args.txt
>>>>>>>>
>>>>>>>> where args.txt is the command line (which could list all the 
>>>>>>>> headers you want!).
>>>>>>>>
>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>
>>>>>>>> Maurizio