[foreign] RFR 8220544: Jextract; library dependent header filtering

Tue Mar 19 14:04:10 UTC 2019

This looks good to me, I have no further comments.

Great work!

(I'll leave it to Sundar for further comments/suggestions)

Maurizio

On 19/03/2019 13:50, Jorn Vernee wrote:
> Suggestions sound good.
>
> Here's the updated webrev: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>
> * Split library based filtering code off from SymbolFilter into 
> LibraryLookupFilter.
> * Merged DependencyFinder & ElementFilter into DependencyFilter (as 
> nested classes).
> * Also added 2 more test cases to check that symbols being filtered 
> out don't have their dependencies added.
>
> Jorn
>
> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>> Looks good, I did some tests and it seems to behave according to the
>> expectations - some comments below:
>>
>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>> Hi,
>>>
>>> I have implemented this: 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>
>>> This adds 2 options --include-headers and --exclude-headers that 
>>> take regular expression patterns as an argument. The patterns are 
>>> matched against a cursor's full header file path to determine if it 
>>> appears in the header root set.
>>>
>>> For example, I can use the following to include headers in the 
>>> Python include director;
>>>
>>>     --include-headers "J:\\Python27\\include\\.*"
>>>
>>> ---
>>>
>>> To recap, we now:
>>>   1.) Filter functions, vars, and macros by whether they appear in a 
>>> root header.
>>>   2.) Filter functions, vars, and macros by the 
>>> --includes/exclude-symbols patterns.
>>>   3.) Filter functions and vars, by whether they appear in any 
>>> shared library.
>>
>> I think this is ok, but the way this happens in the code seems to be
>> sort of ad hoc, and probably the result of how the code use to be in
>> previous rounds.
>>
>> For instance, SymbolFilter does both (1), (2) and (3).
>>
>> Then we have DependencyFinder which also does a bit of (1).
>>
>> Finally we have ElementFilter, which drops everything that is not
>> required (as per 4, 5).
>>
>>
>> I believe the code could be improved by:
>>
>> * separating the library-based filtering from the option-driven
>> filtering. These are two separate mechanism (the latter being optional
>> on -L being used at all). Of course we should also decide which is
>> used first.
>>
>> * bring together DependencyFinder and ElementFilter into
>> DependencyFilter. You can have a two pass visitor - collect deps on a
>> first pass, then prune on the second pass. This way the state remains
>> 'internal' to the DependencyFilter visitor, no need to pass magic
>> state between two passes.
>>
>> Maurizio
>>
>>
>>>   4.) For the remaining trees in the root headers do dependency 
>>> analysis, and find the required structs, enums, and typedefs.
>>>   5.) For structs, enums, and typedefs include them if they appear 
>>> in a root header OR are required as dependency.
>>>
>>> Cheers,
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>
>>>>> I think this is a good idea! If you don't think this adds to much 
>>>>> complexity to the options?
>>>>>
>>>>> So we'd basically have:
>>>>>
>>>>>   Without --include/exclude-header -> same behavior we have right 
>>>>> now.
>>>>>   With --include/exclude-header -> Add included headers to root 
>>>>> set + find any dependencies, and put both in the generated artifact.
>>>>>
>>>>> I like that it does no filtering by default, so ad-hoc users don't 
>>>>> have to figure out which header files define which functions.
>>>>
>>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>>> adds the filter option that makes most sense when you want to chop
>>>> down an API for good. This is a filtering mechanism after all, the
>>>> fact that it's coming back as a filtering option is IMHO, a plus, no a
>>>> minus.
>>>>
>>>> Btw, slight revision to what I said previously - let's make the
>>>> argument of include/exclude regexes too (as for others). This way it
>>>> can be even easier to use, most headers do have some commonalities in
>>>> them - and I like that, with that,  we can basically do the path-based
>>>> heuristics w/o hardwiring that into jextract.
>>>>
>>>> Maurizio
>>>>
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>
>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>> I've implemented this, and now doing a clean build based on the 
>>>>>>>> latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>
>>>>>>>> It's working nicely, with one caveat; Some headers rely on 
>>>>>>>> pre-processor code defined in a parent header. For instance, 
>>>>>>>> Python.h defines some pre-processor code used in pythonrun.h . 
>>>>>>>> The example we have only uses functions defined in pythonrun.h, 
>>>>>>>> but if we just pass that header to jextract Clang throws an 
>>>>>>>> error because of the missing pre-processor code from Python.h
>>>>>>>>
>>>>>>>> In this case we have to pass Python.h first, and then 
>>>>>>>> pythonrun.h to get everything to work. This also relies on the 
>>>>>>>> existence of header include guards in the pythonrun.h header 
>>>>>>>> (since we're basically including it twice). A similar caveat 
>>>>>>>> exists with the Windows API.
>>>>>>>
>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>
>>>>>>> What if the root set was a 'first class' concept in the 
>>>>>>> extraction run, rather than something we infer from this or than 
>>>>>>> command line option?
>>>>>>>
>>>>>>> That way we could point jextract at python.h, but then say "hey, 
>>>>>>> I'm only really interested at stuff that comes from pythonrun.h".
>>>>>>
>>>>>> In other words, right now we have
>>>>>>
>>>>>> --include-symbol
>>>>>>
>>>>>> and
>>>>>>
>>>>>> --exclude-symbol
>>>>>>
>>>>>> If not set, everything is included.
>>>>>>
>>>>>> Maybe all we need is:
>>>>>>
>>>>>> --include-header <header>
>>>>>>
>>>>>> --exclude-header <header>
>>>>>>
>>>>>> And, again, if none is specified, all headers are part of the 
>>>>>> 'root set'.
>>>>>>
>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>>
>>>>>>>> So, the guide-line is: Pass the main header first, then 
>>>>>>>> internal headers. e.g. If we have a main header A.h which 
>>>>>>>> includes a_impl.h, and another main header B.h which includes 
>>>>>>>> b_impl.h, headers should be passed to jextract in the order: 
>>>>>>>> A.h a_impl.h B.h b_impl.h
>>>>>>>>
>>>>>>>> I think the behavior being dependent on the ordering of the 
>>>>>>>> headers could be fixed by sorting the headers in topological 
>>>>>>>> order. But, there's still the requirement to pass all of them, 
>>>>>>>> or things break.
>>>>>>>>
>>>>>>>> Any thoughs about that?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>> I've already been using shell scripts mostly when running 
>>>>>>>>> jextract
>>>>>>>>> (except for simple examples). I find it very useful to be able to
>>>>>>>>> split the command over multiple lines, especially long file paths
>>>>>>>>> become much more readable.
>>>>>>>>>
>>>>>>>>> I'll start working on this then.
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>> I still like this approach, and I think adding support for 
>>>>>>>>>>> wildcard patterns and/or header filters would make it better.
>>>>>>>>>>>
>>>>>>>>>>> Like you said: It's dead simple. What you pass to jextract 
>>>>>>>>>>> is what you get. Though, we could still apply dependency 
>>>>>>>>>>> analysis, which would make sure nothing that's needed gets 
>>>>>>>>>>> dropped.
>>>>>>>>>>
>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>
>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>
>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>
>>>>>>>>>> which works
>>>>>>>>>>
>>>>>>>>>> also, note that jextract also accepts a file with the @ 
>>>>>>>>>> syntax, etc.
>>>>>>>>>>
>>>>>>>>>> jextract @args.txt
>>>>>>>>>>
>>>>>>>>>> where args.txt is the command line (which could list all the 
>>>>>>>>>> headers you want!).
>>>>>>>>>>
>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>
>>>>>>>>>> Maurizio