[foreign] RFR 8220544: Jextract; library dependent header filtering
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Mar 19 21:06:29 UTC 2019
Cool! This is very nice milestone for jextract :-)
When you have time (and if you can) it would be nice to get some stats
on extracting a library (e.g. Python) on Windows before/after, to get a
sense of what degree of improvement in terms of static footprint we can
get. I tested on Linux and I can get smaller jars (up to 2x) but I sense
that the improvement is much more dramatic on Windows-land.
Maurizio
On 19/03/2019 19:58, Jorn Vernee wrote:
> Thanks for the reviews. I've updated the comment and pushed.
>
> Jorn
>
> Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
>> Tests passed. Please go ahead with push.
>>
>> -Sundar
>>
>> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
>>> * 47 * A merge between {@link DependencyFinder} and {@link
>>> ElementFilter}
>>>
>>> in DependencyFilter. Could be removed?
>>>
>>> No comments other than that. +1
>>>
>>> I'm going to submit an internal mach5 build/request for this patch.
>>> I'll notify you as soon as it is all green.
>>>
>>> -Sundar
>>>
>>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
>>>> This looks good to me, I have no further comments.
>>>>
>>>> Great work!
>>>>
>>>> (I'll leave it to Sundar for further comments/suggestions)
>>>>
>>>> Maurizio
>>>>
>>>> On 19/03/2019 13:50, Jorn Vernee wrote:
>>>>> Suggestions sound good.
>>>>>
>>>>> Here's the updated webrev:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
>>>>>
>>>>> * Split library based filtering code off from SymbolFilter into
>>>>> LibraryLookupFilter.
>>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter
>>>>> (as nested classes).
>>>>> * Also added 2 more test cases to check that symbols being
>>>>> filtered out don't have their dependencies added.
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
>>>>>> Looks good, I did some tests and it seems to behave according to the
>>>>>> expectations - some comments below:
>>>>>>
>>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have implemented this:
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
>>>>>>>
>>>>>>>
>>>>>>> This adds 2 options --include-headers and --exclude-headers that
>>>>>>> take regular expression patterns as an argument. The patterns
>>>>>>> are matched against a cursor's full header file path to
>>>>>>> determine if it appears in the header root set.
>>>>>>>
>>>>>>> For example, I can use the following to include headers in the
>>>>>>> Python include director;
>>>>>>>
>>>>>>> --include-headers "J:\\Python27\\include\\.*"
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> To recap, we now:
>>>>>>> 1.) Filter functions, vars, and macros by whether they appear
>>>>>>> in a root header.
>>>>>>> 2.) Filter functions, vars, and macros by the
>>>>>>> --includes/exclude-symbols patterns.
>>>>>>> 3.) Filter functions and vars, by whether they appear in any
>>>>>>> shared library.
>>>>>>
>>>>>> I think this is ok, but the way this happens in the code seems to be
>>>>>> sort of ad hoc, and probably the result of how the code use to be in
>>>>>> previous rounds.
>>>>>>
>>>>>> For instance, SymbolFilter does both (1), (2) and (3).
>>>>>>
>>>>>> Then we have DependencyFinder which also does a bit of (1).
>>>>>>
>>>>>> Finally we have ElementFilter, which drops everything that is not
>>>>>> required (as per 4, 5).
>>>>>>
>>>>>>
>>>>>> I believe the code could be improved by:
>>>>>>
>>>>>> * separating the library-based filtering from the option-driven
>>>>>> filtering. These are two separate mechanism (the latter being
>>>>>> optional
>>>>>> on -L being used at all). Of course we should also decide which is
>>>>>> used first.
>>>>>>
>>>>>> * bring together DependencyFinder and ElementFilter into
>>>>>> DependencyFilter. You can have a two pass visitor - collect deps
>>>>>> on a
>>>>>> first pass, then prune on the second pass. This way the state
>>>>>> remains
>>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
>>>>>> state between two passes.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>
>>>>>>> 4.) For the remaining trees in the root headers do dependency
>>>>>>> analysis, and find the required structs, enums, and typedefs.
>>>>>>> 5.) For structs, enums, and typedefs include them if they
>>>>>>> appear in a root header OR are required as dependency.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jorn
>>>>>>>
>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
>>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
>>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
>>>>>>>>>
>>>>>>>>> I think this is a good idea! If you don't think this adds to
>>>>>>>>> much complexity to the options?
>>>>>>>>>
>>>>>>>>> So we'd basically have:
>>>>>>>>>
>>>>>>>>> Without --include/exclude-header -> same behavior we have
>>>>>>>>> right now.
>>>>>>>>> With --include/exclude-header -> Add included headers to
>>>>>>>>> root set + find any dependencies, and put both in the
>>>>>>>>> generated artifact.
>>>>>>>>>
>>>>>>>>> I like that it does no filtering by default, so ad-hoc users
>>>>>>>>> don't have to figure out which header files define which
>>>>>>>>> functions.
>>>>>>>>
>>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
>>>>>>>> adds the filter option that makes most sense when you want to chop
>>>>>>>> down an API for good. This is a filtering mechanism after all, the
>>>>>>>> fact that it's coming back as a filtering option is IMHO, a
>>>>>>>> plus, no a
>>>>>>>> minus.
>>>>>>>>
>>>>>>>> Btw, slight revision to what I said previously - let's make the
>>>>>>>> argument of include/exclude regexes too (as for others). This
>>>>>>>> way it
>>>>>>>> can be even easier to use, most headers do have some
>>>>>>>> commonalities in
>>>>>>>> them - and I like that, with that, we can basically do the
>>>>>>>> path-based
>>>>>>>> heuristics w/o hardwiring that into jextract.
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
>>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
>>>>>>>>>>>> I've implemented this, and now doing a clean build based on
>>>>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
>>>>>>>>>>>>
>>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
>>>>>>>>>>>> pre-processor code defined in a parent header. For
>>>>>>>>>>>> instance, Python.h defines some pre-processor code used in
>>>>>>>>>>>> pythonrun.h . The example we have only uses functions
>>>>>>>>>>>> defined in pythonrun.h, but if we just pass that header to
>>>>>>>>>>>> jextract Clang throws an error because of the missing
>>>>>>>>>>>> pre-processor code from Python.h
>>>>>>>>>>>>
>>>>>>>>>>>> In this case we have to pass Python.h first, and then
>>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on
>>>>>>>>>>>> the existence of header include guards in the pythonrun.h
>>>>>>>>>>>> header (since we're basically including it twice). A
>>>>>>>>>>>> similar caveat exists with the Windows API.
>>>>>>>>>>>
>>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
>>>>>>>>>>>
>>>>>>>>>>> What if the root set was a 'first class' concept in the
>>>>>>>>>>> extraction run, rather than something we infer from this or
>>>>>>>>>>> than command line option?
>>>>>>>>>>>
>>>>>>>>>>> That way we could point jextract at python.h, but then say
>>>>>>>>>>> "hey, I'm only really interested at stuff that comes from
>>>>>>>>>>> pythonrun.h".
>>>>>>>>>>
>>>>>>>>>> In other words, right now we have
>>>>>>>>>>
>>>>>>>>>> --include-symbol
>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> --exclude-symbol
>>>>>>>>>>
>>>>>>>>>> If not set, everything is included.
>>>>>>>>>>
>>>>>>>>>> Maybe all we need is:
>>>>>>>>>>
>>>>>>>>>> --include-header <header>
>>>>>>>>>>
>>>>>>>>>> --exclude-header <header>
>>>>>>>>>>
>>>>>>>>>> And, again, if none is specified, all headers are part of the
>>>>>>>>>> 'root set'.
>>>>>>>>>>
>>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
>>>>>>>>>> 'include path' for header names in --include/exclude-header
>>>>>>>>>>
>>>>>>>>>> Maurizio
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Maurizio
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So, the guide-line is: Pass the main header first, then
>>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which
>>>>>>>>>>>> includes a_impl.h, and another main header B.h which
>>>>>>>>>>>> includes b_impl.h, headers should be passed to jextract in
>>>>>>>>>>>> the order: A.h a_impl.h B.h b_impl.h
>>>>>>>>>>>>
>>>>>>>>>>>> I think the behavior being dependent on the ordering of the
>>>>>>>>>>>> headers could be fixed by sorting the headers in
>>>>>>>>>>>> topological order. But, there's still the requirement to
>>>>>>>>>>>> pass all of them, or things break.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughs about that?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
>>>>>>>>>>>>> I've already been using shell scripts mostly when running
>>>>>>>>>>>>> jextract
>>>>>>>>>>>>> (except for simple examples). I find it very useful to be
>>>>>>>>>>>>> able to
>>>>>>>>>>>>> split the command over multiple lines, especially long
>>>>>>>>>>>>> file paths
>>>>>>>>>>>>> become much more readable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll start working on this then.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
>>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
>>>>>>>>>>>>>>> I still like this approach, and I think adding support
>>>>>>>>>>>>>>> for wildcard patterns and/or header filters would make
>>>>>>>>>>>>>>> it better.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to
>>>>>>>>>>>>>>> jextract is what you get. Though, we could still apply
>>>>>>>>>>>>>>> dependency analysis, which would make sure nothing
>>>>>>>>>>>>>>> that's needed gets dropped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why don't we try this then?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that in javac we often use this trick:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> javac `find <path> -name *.java`
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> which works
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> also, note that jextract also accepts a file with the @
>>>>>>>>>>>>>> syntax, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> jextract @args.txt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> where args.txt is the command line (which could list all
>>>>>>>>>>>>>> the headers you want!).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list