[foreign] RFR 8220544: Jextract; library dependent header filtering

Wed Mar 20 00:48:04 UTC 2019

+1. Very Nice!

-Sundar

On 20/03/19, 2:36 AM, Maurizio Cimadamore wrote:
> Cool! This is very nice milestone for jextract :-)
>
> When you have time (and if you can) it would be nice to get some stats 
> on extracting a library (e.g. Python) on Windows before/after, to get 
> a sense of what degree of improvement in terms of static footprint we 
> can get. I tested on Linux and I can get smaller jars (up to 2x) but I 
> sense that the improvement is much more dramatic on Windows-land.
>
> Maurizio
>
> On 19/03/2019 19:58, Jorn Vernee wrote:
>> Thanks for the reviews. I've updated the comment and pushed.
>>
>> Jorn
>>
>> Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
>>> Tests passed. Please go ahead with push.
>>>
>>> -Sundar
>>>
>>> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
>>>> * 47 * A merge between {@link DependencyFinder} and {@link 
>>>> ElementFilter}
>>>>
>>>> in DependencyFilter. Could be removed?
>>>>
>>>> No comments other than that. +1
>>>>
>>>> I'm going to submit an internal mach5 build/request for this patch. 
>>>> I'll notify you as soon as it is all green.
>>>>
>>>> -Sundar
>>>>
>>>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>>>>> This looks good to me, I have no further comments.
>>>>>
>>>>> Great work!
>>>>>
>>>>> (I'll leave it to Sundar for further comments/suggestions)
>>>>>
>>>>> Maurizio
>>>>>
>>>>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>>>>> Suggestions sound good.
>>>>>>
>>>>>> Here's the updated webrev: 
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/ 
>>>>>>
>>>>>>
>>>>>> * Split library based filtering code off from SymbolFilter into 
>>>>>> LibraryLookupFilter.
>>>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter 
>>>>>> (as nested classes).
>>>>>> * Also added 2 more test cases to check that symbols being 
>>>>>> filtered out don't have their dependencies added.
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>>>>> Looks good, I did some tests and it seems to behave according to 
>>>>>>> the
>>>>>>> expectations - some comments below:
>>>>>>>
>>>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have implemented this: 
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> This adds 2 options --include-headers and --exclude-headers 
>>>>>>>> that take regular expression patterns as an argument. The 
>>>>>>>> patterns are matched against a cursor's full header file path 
>>>>>>>> to determine if it appears in the header root set.
>>>>>>>>
>>>>>>>> For example, I can use the following to include headers in the 
>>>>>>>> Python include director;
>>>>>>>>
>>>>>>>>     --include-headers "J:\\Python27\\include\\.*"
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> To recap, we now:
>>>>>>>>   1.) Filter functions, vars, and macros by whether they appear 
>>>>>>>> in a root header.
>>>>>>>>   2.) Filter functions, vars, and macros by the 
>>>>>>>> --includes/exclude-symbols patterns.
>>>>>>>>   3.) Filter functions and vars, by whether they appear in any 
>>>>>>>> shared library.
>>>>>>>
>>>>>>> I think this is ok, but the way this happens in the code seems 
>>>>>>> to be
>>>>>>> sort of ad hoc, and probably the result of how the code use to 
>>>>>>> be in
>>>>>>> previous rounds.
>>>>>>>
>>>>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>>>>
>>>>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>>>>
>>>>>>> Finally we have ElementFilter, which drops everything that is not
>>>>>>> required (as per 4, 5).
>>>>>>>
>>>>>>>
>>>>>>> I believe the code could be improved by:
>>>>>>>
>>>>>>> * separating the library-based filtering from the option-driven
>>>>>>> filtering. These are two separate mechanism (the latter being 
>>>>>>> optional
>>>>>>> on -L being used at all). Of course we should also decide which is
>>>>>>> used first.
>>>>>>>
>>>>>>> * bring together DependencyFinder and ElementFilter into
>>>>>>> DependencyFilter. You can have a two pass visitor - collect deps 
>>>>>>> on a
>>>>>>> first pass, then prune on the second pass. This way the state 
>>>>>>> remains
>>>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>>>>> state between two passes.
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>
>>>>>>>>   4.) For the remaining trees in the root headers do dependency 
>>>>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>>>>>   5.) For structs, enums, and typedefs include them if they 
>>>>>>>> appear in a root header OR are required as dependency.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>>>>
>>>>>>>>>> I think this is a good idea! If you don't think this adds to 
>>>>>>>>>> much complexity to the options?
>>>>>>>>>>
>>>>>>>>>> So we'd basically have:
>>>>>>>>>>
>>>>>>>>>>   Without --include/exclude-header -> same behavior we have 
>>>>>>>>>> right now.
>>>>>>>>>>   With --include/exclude-header -> Add included headers to 
>>>>>>>>>> root set + find any dependencies, and put both in the 
>>>>>>>>>> generated artifact.
>>>>>>>>>>
>>>>>>>>>> I like that it does no filtering by default, so ad-hoc users 
>>>>>>>>>> don't have to figure out which header files define which 
>>>>>>>>>> functions.
>>>>>>>>>
>>>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, 
>>>>>>>>> and it
>>>>>>>>> adds the filter option that makes most sense when you want to 
>>>>>>>>> chop
>>>>>>>>> down an API for good. This is a filtering mechanism after all, 
>>>>>>>>> the
>>>>>>>>> fact that it's coming back as a filtering option is IMHO, a 
>>>>>>>>> plus, no a
>>>>>>>>> minus.
>>>>>>>>>
>>>>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>>>>> argument of include/exclude regexes too (as for others). This 
>>>>>>>>> way it
>>>>>>>>> can be even easier to use, most headers do have some 
>>>>>>>>> commonalities in
>>>>>>>>> them - and I like that, with that,  we can basically do the 
>>>>>>>>> path-based
>>>>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>>>>> I've implemented this, and now doing a clean build based 
>>>>>>>>>>>>> on the latest jdk/jdk merge before submitting the next 
>>>>>>>>>>>>> webrev.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on 
>>>>>>>>>>>>> pre-processor code defined in a parent header. For 
>>>>>>>>>>>>> instance, Python.h defines some pre-processor code used in 
>>>>>>>>>>>>> pythonrun.h . The example we have only uses functions 
>>>>>>>>>>>>> defined in pythonrun.h, but if we just pass that header to 
>>>>>>>>>>>>> jextract Clang throws an error because of the missing 
>>>>>>>>>>>>> pre-processor code from Python.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this case we have to pass Python.h first, and then 
>>>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on 
>>>>>>>>>>>>> the existence of header include guards in the pythonrun.h 
>>>>>>>>>>>>> header (since we're basically including it twice). A 
>>>>>>>>>>>>> similar caveat exists with the Windows API.
>>>>>>>>>>>>
>>>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>>>>
>>>>>>>>>>>> What if the root set was a 'first class' concept in the 
>>>>>>>>>>>> extraction run, rather than something we infer from this or 
>>>>>>>>>>>> than command line option?
>>>>>>>>>>>>
>>>>>>>>>>>> That way we could point jextract at python.h, but then say 
>>>>>>>>>>>> "hey, I'm only really interested at stuff that comes from 
>>>>>>>>>>>> pythonrun.h".
>>>>>>>>>>>
>>>>>>>>>>> In other words, right now we have
>>>>>>>>>>>
>>>>>>>>>>> --include-symbol
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> --exclude-symbol
>>>>>>>>>>>
>>>>>>>>>>> If not set, everything is included.
>>>>>>>>>>>
>>>>>>>>>>> Maybe all we need is:
>>>>>>>>>>>
>>>>>>>>>>> --include-header <header>
>>>>>>>>>>>
>>>>>>>>>>> --exclude-header <header>
>>>>>>>>>>>
>>>>>>>>>>> And, again, if none is specified, all headers are part of 
>>>>>>>>>>> the 'root set'.
>>>>>>>>>>>
>>>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>>>>
>>>>>>>>>>> Maurizio
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, the guide-line is: Pass the main header first, then 
>>>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which 
>>>>>>>>>>>>> includes a_impl.h, and another main header B.h which 
>>>>>>>>>>>>> includes b_impl.h, headers should be passed to jextract in 
>>>>>>>>>>>>> the order: A.h a_impl.h B.h b_impl.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the behavior being dependent on the ordering of 
>>>>>>>>>>>>> the headers could be fixed by sorting the headers in 
>>>>>>>>>>>>> topological order. But, there's still the requirement to 
>>>>>>>>>>>>> pass all of them, or things break.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>>>>> I've already been using shell scripts mostly when running 
>>>>>>>>>>>>>> jextract
>>>>>>>>>>>>>> (except for simple examples). I find it very useful to be 
>>>>>>>>>>>>>> able to
>>>>>>>>>>>>>> split the command over multiple lines, especially long 
>>>>>>>>>>>>>> file paths
>>>>>>>>>>>>>> become much more readable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>>>>> I still like this approach, and I think adding support 
>>>>>>>>>>>>>>>> for wildcard patterns and/or header filters would make 
>>>>>>>>>>>>>>>> it better.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to 
>>>>>>>>>>>>>>>> jextract is what you get. Though, we could still apply 
>>>>>>>>>>>>>>>> dependency analysis, which would make sure nothing 
>>>>>>>>>>>>>>>> that's needed gets dropped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> which works
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> also, note that jextract also accepts a file with the @ 
>>>>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where args.txt is the command line (which could list all 
>>>>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maurizio