[foreign] RFR 8220544: Jextract; library dependent header filtering
Samuel Audet
samuel.audet at gmail.com
Tue Mar 19 22:37:25 UTC 2019
Hi,
It would be great if we could also think about what to do for C++. Is the
plan to hope that Clang is going to support C++ name mangling properly for
all possible C++ compiler out there including MSVC?
Samuel
2019年3月20日(水) 6:07、Maurizio Cimadamore さん(maurizio.cimadamore at oracle.com
)のメッセージ:
> Cool! This is very nice milestone for jextract :-)
>
> When you have time (and if you can) it would be nice to get some stats
> on extracting a library (e.g. Python) on Windows before/after, to get a
> sense of what degree of improvement in terms of static footprint we can
> get. I tested on Linux and I can get smaller jars (up to 2x) but I sense
> that the improvement is much more dramatic on Windows-land.
>
> Maurizio
>
> On 19/03/2019 19:58, Jorn Vernee wrote:
> > Thanks for the reviews. I've updated the comment and pushed.
> >
> > Jorn
> >
> > Sundararajan Athijegannathan schreef op 2019-03-19 18:07:
> >> Tests passed. Please go ahead with push.
> >>
> >> -Sundar
> >>
> >> On 19/03/19, 7:46 PM, Sundararajan Athijegannathan wrote:
> >>> * 47 * A merge between {@link DependencyFinder} and {@link
> >>> ElementFilter}
> >>>
> >>> in DependencyFilter. Could be removed?
> >>>
> >>> No comments other than that. +1
> >>>
> >>> I'm going to submit an internal mach5 build/request for this patch.
> >>> I'll notify you as soon as it is all green.
> >>>
> >>> -Sundar
> >>>
> >>> On 19/03/19, 7:34 PM, Maurizio Cimadamore wrote:
> >>>> This looks good to me, I have no further comments.
> >>>>
> >>>> Great work!
> >>>>
> >>>> (I'll leave it to Sundar for further comments/suggestions)
> >>>>
> >>>> Maurizio
> >>>>
> >>>> On 19/03/2019 13:50, Jorn Vernee wrote:
> >>>>> Suggestions sound good.
> >>>>>
> >>>>> Here's the updated webrev:
> >>>>>
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.07/
> >>>>>
> >>>>> * Split library based filtering code off from SymbolFilter into
> >>>>> LibraryLookupFilter.
> >>>>> * Merged DependencyFinder & ElementFilter into DependencyFilter
> >>>>> (as nested classes).
> >>>>> * Also added 2 more test cases to check that symbols being
> >>>>> filtered out don't have their dependencies added.
> >>>>>
> >>>>> Jorn
> >>>>>
> >>>>> Maurizio Cimadamore schreef op 2019-03-19 13:40:
> >>>>>> Looks good, I did some tests and it seems to behave according to the
> >>>>>> expectations - some comments below:
> >>>>>>
> >>>>>> On 18/03/2019 10:58, Jorn Vernee wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have implemented this:
> >>>>>>>
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.06/
> >>>>>>>
> >>>>>>>
> >>>>>>> This adds 2 options --include-headers and --exclude-headers that
> >>>>>>> take regular expression patterns as an argument. The patterns
> >>>>>>> are matched against a cursor's full header file path to
> >>>>>>> determine if it appears in the header root set.
> >>>>>>>
> >>>>>>> For example, I can use the following to include headers in the
> >>>>>>> Python include director;
> >>>>>>>
> >>>>>>> --include-headers "J:\\Python27\\include\\.*"
> >>>>>>>
> >>>>>>> ---
> >>>>>>>
> >>>>>>> To recap, we now:
> >>>>>>> 1.) Filter functions, vars, and macros by whether they appear
> >>>>>>> in a root header.
> >>>>>>> 2.) Filter functions, vars, and macros by the
> >>>>>>> --includes/exclude-symbols patterns.
> >>>>>>> 3.) Filter functions and vars, by whether they appear in any
> >>>>>>> shared library.
> >>>>>>
> >>>>>> I think this is ok, but the way this happens in the code seems to be
> >>>>>> sort of ad hoc, and probably the result of how the code use to be in
> >>>>>> previous rounds.
> >>>>>>
> >>>>>> For instance, SymbolFilter does both (1), (2) and (3).
> >>>>>>
> >>>>>> Then we have DependencyFinder which also does a bit of (1).
> >>>>>>
> >>>>>> Finally we have ElementFilter, which drops everything that is not
> >>>>>> required (as per 4, 5).
> >>>>>>
> >>>>>>
> >>>>>> I believe the code could be improved by:
> >>>>>>
> >>>>>> * separating the library-based filtering from the option-driven
> >>>>>> filtering. These are two separate mechanism (the latter being
> >>>>>> optional
> >>>>>> on -L being used at all). Of course we should also decide which is
> >>>>>> used first.
> >>>>>>
> >>>>>> * bring together DependencyFinder and ElementFilter into
> >>>>>> DependencyFilter. You can have a two pass visitor - collect deps
> >>>>>> on a
> >>>>>> first pass, then prune on the second pass. This way the state
> >>>>>> remains
> >>>>>> 'internal' to the DependencyFilter visitor, no need to pass magic
> >>>>>> state between two passes.
> >>>>>>
> >>>>>> Maurizio
> >>>>>>
> >>>>>>
> >>>>>>> 4.) For the remaining trees in the root headers do dependency
> >>>>>>> analysis, and find the required structs, enums, and typedefs.
> >>>>>>> 5.) For structs, enums, and typedefs include them if they
> >>>>>>> appear in a root header OR are required as dependency.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Jorn
> >>>>>>>
> >>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:29:
> >>>>>>>> On 15/03/2019 18:21, Jorn Vernee wrote:
> >>>>>>>>> Heh - was writing an email pretty much with this suggestion :P
> >>>>>>>>>
> >>>>>>>>> I think this is a good idea! If you don't think this adds to
> >>>>>>>>> much complexity to the options?
> >>>>>>>>>
> >>>>>>>>> So we'd basically have:
> >>>>>>>>>
> >>>>>>>>> Without --include/exclude-header -> same behavior we have
> >>>>>>>>> right now.
> >>>>>>>>> With --include/exclude-header -> Add included headers to
> >>>>>>>>> root set + find any dependencies, and put both in the
> >>>>>>>>> generated artifact.
> >>>>>>>>>
> >>>>>>>>> I like that it does no filtering by default, so ad-hoc users
> >>>>>>>>> don't have to figure out which header files define which
> >>>>>>>>> functions.
> >>>>>>>>
> >>>>>>>> Yep - I like this too. It goes towards the 'no magic' goal, and it
> >>>>>>>> adds the filter option that makes most sense when you want to chop
> >>>>>>>> down an API for good. This is a filtering mechanism after all, the
> >>>>>>>> fact that it's coming back as a filtering option is IMHO, a
> >>>>>>>> plus, no a
> >>>>>>>> minus.
> >>>>>>>>
> >>>>>>>> Btw, slight revision to what I said previously - let's make the
> >>>>>>>> argument of include/exclude regexes too (as for others). This
> >>>>>>>> way it
> >>>>>>>> can be even easier to use, most headers do have some
> >>>>>>>> commonalities in
> >>>>>>>> them - and I like that, with that, we can basically do the
> >>>>>>>> path-based
> >>>>>>>> heuristics w/o hardwiring that into jextract.
> >>>>>>>>
> >>>>>>>> Maurizio
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Jorn
> >>>>>>>>>
> >>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 19:07:
> >>>>>>>>>> On 15/03/2019 18:01, Maurizio Cimadamore wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On 15/03/2019 17:48, Jorn Vernee wrote:
> >>>>>>>>>>>> I've implemented this, and now doing a clean build based on
> >>>>>>>>>>>> the latest jdk/jdk merge before submitting the next webrev.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It's working nicely, with one caveat; Some headers rely on
> >>>>>>>>>>>> pre-processor code defined in a parent header. For
> >>>>>>>>>>>> instance, Python.h defines some pre-processor code used in
> >>>>>>>>>>>> pythonrun.h . The example we have only uses functions
> >>>>>>>>>>>> defined in pythonrun.h, but if we just pass that header to
> >>>>>>>>>>>> jextract Clang throws an error because of the missing
> >>>>>>>>>>>> pre-processor code from Python.h
> >>>>>>>>>>>>
> >>>>>>>>>>>> In this case we have to pass Python.h first, and then
> >>>>>>>>>>>> pythonrun.h to get everything to work. This also relies on
> >>>>>>>>>>>> the existence of header include guards in the pythonrun.h
> >>>>>>>>>>>> header (since we're basically including it twice). A
> >>>>>>>>>>>> similar caveat exists with the Windows API.
> >>>>>>>>>>>
> >>>>>>>>>>> Doh! Maybe I've missed the simplest option after all.
> >>>>>>>>>>>
> >>>>>>>>>>> What if the root set was a 'first class' concept in the
> >>>>>>>>>>> extraction run, rather than something we infer from this or
> >>>>>>>>>>> than command line option?
> >>>>>>>>>>>
> >>>>>>>>>>> That way we could point jextract at python.h, but then say
> >>>>>>>>>>> "hey, I'm only really interested at stuff that comes from
> >>>>>>>>>>> pythonrun.h".
> >>>>>>>>>>
> >>>>>>>>>> In other words, right now we have
> >>>>>>>>>>
> >>>>>>>>>> --include-symbol
> >>>>>>>>>>
> >>>>>>>>>> and
> >>>>>>>>>>
> >>>>>>>>>> --exclude-symbol
> >>>>>>>>>>
> >>>>>>>>>> If not set, everything is included.
> >>>>>>>>>>
> >>>>>>>>>> Maybe all we need is:
> >>>>>>>>>>
> >>>>>>>>>> --include-header <header>
> >>>>>>>>>>
> >>>>>>>>>> --exclude-header <header>
> >>>>>>>>>>
> >>>>>>>>>> And, again, if none is specified, all headers are part of the
> >>>>>>>>>> 'root set'.
> >>>>>>>>>>
> >>>>>>>>>> And, to address a concern you had - yes, I'd consider using the
> >>>>>>>>>> 'include path' for header names in --include/exclude-header
> >>>>>>>>>>
> >>>>>>>>>> Maurizio
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Maurizio
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, the guide-line is: Pass the main header first, then
> >>>>>>>>>>>> internal headers. e.g. If we have a main header A.h which
> >>>>>>>>>>>> includes a_impl.h, and another main header B.h which
> >>>>>>>>>>>> includes b_impl.h, headers should be passed to jextract in
> >>>>>>>>>>>> the order: A.h a_impl.h B.h b_impl.h
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think the behavior being dependent on the ordering of the
> >>>>>>>>>>>> headers could be fixed by sorting the headers in
> >>>>>>>>>>>> topological order. But, there's still the requirement to
> >>>>>>>>>>>> pass all of them, or things break.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any thoughs about that?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Jorn
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jorn Vernee schreef op 2019-03-15 14:40:
> >>>>>>>>>>>>> I've already been using shell scripts mostly when running
> >>>>>>>>>>>>> jextract
> >>>>>>>>>>>>> (except for simple examples). I find it very useful to be
> >>>>>>>>>>>>> able to
> >>>>>>>>>>>>> split the command over multiple lines, especially long
> >>>>>>>>>>>>> file paths
> >>>>>>>>>>>>> become much more readable.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll start working on this then.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jorn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-15 14:21:
> >>>>>>>>>>>>>> On 15/03/2019 13:15, Jorn Vernee wrote:
> >>>>>>>>>>>>>>> I still like this approach, and I think adding support
> >>>>>>>>>>>>>>> for wildcard patterns and/or header filters would make
> >>>>>>>>>>>>>>> it better.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Like you said: It's dead simple. What you pass to
> >>>>>>>>>>>>>>> jextract is what you get. Though, we could still apply
> >>>>>>>>>>>>>>> dependency analysis, which would make sure nothing
> >>>>>>>>>>>>>>> that's needed gets dropped.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Why don't we try this then?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Note that in javac we often use this trick:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> javac `find <path> -name *.java`
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> which works
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> also, note that jextract also accepts a file with the @
> >>>>>>>>>>>>>> syntax, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> jextract @args.txt
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> where args.txt is the command line (which could list all
> >>>>>>>>>>>>>> the headers you want!).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe this is indeed the simpler approach.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maurizio
>
More information about the panama-dev
mailing list