[foreign] RFR 8220544: Jextract; library dependent header filtering
Sundararajan Athijegannathan
sundararajan.athijegannathan at oracle.com
Wed Mar 20 00:48:04 UTC 2019
+1. Very Nice!
-Sundar
On 20/03/19, 2:36 AM, Maurizio Cimadamore wrote:
> Cool! This is very nice milestone for jextract :-)
>
> When you have time (and if you can) it would be nice to get some stats
> on extracting a library (e.g. Python) on Windows before/after, to get
> a sense of what degree of improvement in terms of static footprint we
> can get. I tested on Linux and I can get smaller jars (up to 2x) but I
> sense that the improvement is much more dramatic on Windows-land.
>
> Maurizio
>
> On 19/03/2019 19:58, Jorn Vernee wrote:
>> Thanks for the reviews. I've updated the comment and pushed.
>>
>> Jorn
>>
>> Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
>>> Tests passed. Please go ahead with push.
>>>
>>> -Sundar
>>>
>>> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
>>>> * 47 * A merge between {@link DependencyFinder} and {@link
>>>> ElementFilter}
>>>>
>>>> in DependencyFilter. Could be removed?
>>>>
>>>> No comments other than that. +1
>>>>
>>>> I'm going to submit an internal mach5 build/request for this patch.
>>>> I'll notify you as soon as it is all green.
>>>>
>>>> -Sundar
>>>>
>>>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>>>>> This looks good to me, I have no further comments.
>>>>>
>>>>> Great work!
>>>>>
>>>>> (I'll leave it to Sundar for further comments/suggestions)
>>>>>
>>>>> Maurizio
>>>>>
>>>>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>>>>> Suggestions sound good.
>>>>>>
>>>>>> Here's the updated webrev:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>>>>>
>>>>>>
>>>>>> * Split library based filtering code off from SymbolFilter into
>>>>>> LibraryLookupFilter.
>>>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter
>>>>>> (as nested classes).
>>>>>> * Also added 2 more test cases to check that symbols being
>>>>>> filtered out don't have their dependencies added.
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>>>>> Looks good, I did some tests and it seems to behave according to
>>>>>>> the
>>>>>>> expectations - some comments below:
>>>>>>>
>>>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have implemented this:
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>>>>>
>>>>>>>>
>>>>>>>> This adds 2 options --include-headers and --exclude-headers
>>>>>>>> that take regular expression patterns as an argument. The
>>>>>>>> patterns are matched against a cursor's full header file path
>>>>>>>> to determine if it appears in the header root set.
>>>>>>>>
>>>>>>>> For example, I can use the following to include headers in the
>>>>>>>> Python include director;
>>>>>>>>
>>>>>>>> --include-headers "J:\\Python27\\include\\.*"
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> To recap, we now:
>>>>>>>> 1.) Filter functions, vars, and macros by whether they appear
>>>>>>>> in a root header.
>>>>>>>> 2.) Filter functions, vars, and macros by the
>>>>>>>> --includes/exclude-symbols patterns.
>>>>>>>> 3.) Filter functions and vars, by whether they appear in any
>>>>>>>> shared library.
>>>>>>>
>>>>>>> I think this is ok, but the way this happens in the code seems
>>>>>>> to be
>>>>>>> sort of ad hoc, and probably the result of how the code use to
>>>>>>> be in
>>>>>>> previous rounds.
>>>>>>>
>>>>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>>>>
>>>>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>>>>
>>>>>>> Finally we have ElementFilter, which drops everything that is not
>>>>>>> required (as per 4, 5).
>>>>>>>
>>>>>>>
>>>>>>> I believe the code could be improved by:
>>>>>>>
>>>>>>> * separating the library-based filtering from the option-driven
>>>>>>> filtering. These are two separate mechanism (the latter being
>>>>>>> optional
>>>>>>> on -L being used at all). Of course we should also decide which is
>>>>>>> used first.
>>>>>>>
>>>>>>> * bring together DependencyFinder and ElementFilter into
>>>>>>> DependencyFilter. You can have a two pass visitor - collect deps
>>>>>>> on a
>>>>>>> first pass, then prune on the second pass. This way the state
>>>>>>> remains
>>>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>>>>> state between two passes.
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>
>>>>>>>> 4.) For the remaining trees in the root headers do dependency
>>>>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>>>>> 5.) For structs, enums, and typedefs include them if they
>>>>>>>> appear in a root header OR are required as dependency.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>>>>
>>>>>>>>>> I think this is a good idea! If you don't think this adds to
>>>>>>>>>> much complexity to the options?
>>>>>>>>>>
>>>>>>>>>> So we'd basically have:
>>>>>>>>>>
>>>>>>>>>> Without --include/exclude-header -> same behavior we have
>>>>>>>>>> right now.
>>>>>>>>>> With --include/exclude-header -> Add included headers to
>>>>>>>>>> root set + find any dependencies, and put both in the
>>>>>>>>>> generated artifact.
>>>>>>>>>>
>>>>>>>>>> I like that it does no filtering by default, so ad-hoc users
>>>>>>>>>> don't have to figure out which header files define which
>>>>>>>>>> functions.
>>>>>>>>>
>>>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal,
>>>>>>>>> and it
>>>>>>>>> adds the filter option that makes most sense when you want to
>>>>>>>>> chop
>>>>>>>>> down an API for good. This is a filtering mechanism after all,
>>>>>>>>> the
>>>>>>>>> fact that it's coming back as a filtering option is IMHO, a
>>>>>>>>> plus, no a
>>>>>>>>> minus.
>>>>>>>>>
>>>>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>>>>> argument of include/exclude regexes too (as for others). This
>>>>>>>>> way it
>>>>>>>>> can be even easier to use, most headers do have some
>>>>>>>>> commonalities in
>>>>>>>>> them - and I like that, with that, we can basically do the
>>>>>>>>> path-based
>>>>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>>>>> I've implemented this, and now doing a clean build based
>>>>>>>>>>>>> on the latest jdk/jdk merge before submitting the next
>>>>>>>>>>>>> webrev.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>>>>>>>> pre-processor code defined in a parent header. For
>>>>>>>>>>>>> instance, Python.h defines some pre-processor code used in
>>>>>>>>>>>>> pythonrun.h . The example we have only uses functions
>>>>>>>>>>>>> defined in pythonrun.h, but if we just pass that header to
>>>>>>>>>>>>> jextract Clang throws an error because of the missing
>>>>>>>>>>>>> pre-processor code from Python.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this case we have to pass Python.h first, and then
>>>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on
>>>>>>>>>>>>> the existence of header include guards in the pythonrun.h
>>>>>>>>>>>>> header (since we're basically including it twice). A
>>>>>>>>>>>>> similar caveat exists with the Windows API.
>>>>>>>>>>>>
>>>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>>>>
>>>>>>>>>>>> What if the root set was a 'first class' concept in the
>>>>>>>>>>>> extraction run, rather than something we infer from this or
>>>>>>>>>>>> than command line option?
>>>>>>>>>>>>
>>>>>>>>>>>> That way we could point jextract at python.h, but then say
>>>>>>>>>>>> "hey, I'm only really interested at stuff that comes from
>>>>>>>>>>>> pythonrun.h".
>>>>>>>>>>>
>>>>>>>>>>> In other words, right now we have
>>>>>>>>>>>
>>>>>>>>>>> --include-symbol
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> --exclude-symbol
>>>>>>>>>>>
>>>>>>>>>>> If not set, everything is included.
>>>>>>>>>>>
>>>>>>>>>>> Maybe all we need is:
>>>>>>>>>>>
>>>>>>>>>>> --include-header <header>
>>>>>>>>>>>
>>>>>>>>>>> --exclude-header <header>
>>>>>>>>>>>
>>>>>>>>>>> And, again, if none is specified, all headers are part of
>>>>>>>>>>> the 'root set'.
>>>>>>>>>>>
>>>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>>>>
>>>>>>>>>>> Maurizio
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, the guide-line is: Pass the main header first, then
>>>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which
>>>>>>>>>>>>> includes a_impl.h, and another main header B.h which
>>>>>>>>>>>>> includes b_impl.h, headers should be passed to jextract in
>>>>>>>>>>>>> the order: A.h a_impl.h B.h b_impl.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the behavior being dependent on the ordering of
>>>>>>>>>>>>> the headers could be fixed by sorting the headers in
>>>>>>>>>>>>> topological order. But, there's still the requirement to
>>>>>>>>>>>>> pass all of them, or things break.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>>>>>>>> jextract
>>>>>>>>>>>>>> (except for simple examples). I find it very useful to be
>>>>>>>>>>>>>> able to
>>>>>>>>>>>>>> split the command over multiple lines, especially long
>>>>>>>>>>>>>> file paths
>>>>>>>>>>>>>> become much more readable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>>>>> I still like this approach, and I think adding support
>>>>>>>>>>>>>>>> for wildcard patterns and/or header filters would make
>>>>>>>>>>>>>>>> it better.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to
>>>>>>>>>>>>>>>> jextract is what you get. Though, we could still apply
>>>>>>>>>>>>>>>> dependency analysis, which would make sure nothing
>>>>>>>>>>>>>>>> that's needed gets dropped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> which works
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> also, note that jextract also accepts a file with the @
>>>>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where args.txt is the command line (which could list all
>>>>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list