[foreign] RFR 8215502: Jextract output should be self-contained

Tue Dec 18 16:30:39 UTC 2018

Looks like com/sun/tools/jextract/missing/MissingSymbolTest.java is 
failing after this patch?

It looks like it no longer generates the class that the test is testing 
for: 
http://cr.openjdk.java.net/~jvernee/panama/test-results/MissingSymbolTest.jtr

Jorn

Maurizio Cimadamore schreef op 2018-12-17 19:44:
> Hi,
> following external and internal feedback on jextract usability, we
> have been working hard on a partial jextract rewrite which
> significantly simplifies how jextract works and how the various pieces
> are connected together. The idea is that each extraction run should
> generate a self-contained artifact; this is _not_ what is happening
> now, as jextract will only extract things that belong to the header
> being extracted. This means that if the header depends on symbols
> defined in some other included header, in a different folder, then the
> resulting jar artifact will contain dandling refs and will fail to
> bind. To overcome this issues user have in the past been forced to
> 'chase headers' around, and add them to the jextract command line,
> which is a very frustrating (and system-dependent) process.
> 
> Also, few months ago we embarked on a project to parse info obtained
> from clang API into an internal representation (see [1]). But we've
> only gone half-way with this - and the code is in a place where
> sometimes the tree is used, sometimes the clang types are used, which
> can be very unpredictable and hard to follow. This patch fixes all
> that and makes trees the single source of truth; this means that some
> issues in tree parsing had to be rectified, to avoid some common
> issues (see end of this email for a list).
> 
> Here's the webrev:
> 
> https://cr.openjdk.java.net/~mcimadamore/panama/8215502/
> 
> (Not sure if it's better to look at the diffs or to just look at the 
> new code)
> 
> As it can be seen, classes are much smaller now, each doing a fairly
> simple, focussed task:
> 
> * Main: this is where we process command line args; we check them and
> we set up a Context
> 
> * Context: this is where we hold the shared state - mostly holder for
> command line args (after validation)
> 
> * JextractTool: this is the main jextract tool frontend - it has a
> method called 'processHeaders' which does all the job. This work is
> divided in the following steps:
> 
>    - parse all headers into trees
>    - run a bunch of visitors onto the trees (symbol filter, typedef
> handler, etc.)
>    - extract all decls from header trees
>    - get rid of duplicate decls
>    - group decls by header file
>    - for each headerfile
>           + enter types into dictionary (TypeEnter)
>           + generate bytecode using appropriate code factory
>    - return a Writer object
> 
> * Writer: helper object used to write classes/jars
> 
> * HeaderResolver: helper object which keeps track of path/package
> mappings, and which, given a path, returns the appropriate header file
> object. Note: before this patch, all paths were flattened - e.g. if a
> header was found under /usr/include/linux/foo.h, jextract would emit a
> classfile called usr_include_foo.class. The new patch changes this
> behavior and generates proper nested package names, so that the header
> class will have now name "foo" and package "usr.include.linux".
> 
> * TypeDictionary: helper object which, with the help of the
> HeaderResolver, allows to go from clang types to Java types
> 
> 
> Overall, this feels like a big improvement in terms of how easy it is
> to tweak the behavior of jextract - each of the main abstraction is
> simple enough that it is relatively easy to tweak things to get a
> different result, which is a huge step forward compared to where the
> code is ATM (at least in my opinion, which could be biased of course
> :-)). Following this change, we will probably do two followup changes:
> 
> I think the new logic is very deterministic, and easy to follow; there
> are two items where this is still suboptimal:
> 
> 1) we allow multiple headerfiles as input, and this sometimes doesn't
> play well with the parser; we have discussed many times how clang API
> behaves better with single input which includes everything. It would
> probably be better to restrict jextract to work on a single header
> input file only (which, thanks to the new resolution
> logic/self-contained model, now works all the times).
> 
> 2) related to the above, as we discussed, the -t option doesn't make
> a  lot of sense when multiple input headers are involved. I think it
> would be best again to restrict input to be just one headerfile (as
> per 1), and also transform -t into a proper 'package prefix' option
> instead, which adds given prefix to all package names generated by
> jextract. This is very useful to package up self-contained artifacts
> in way that do no clash when loaded by same classloader.
> 
> As for the issues in tree parsing that were addressed as part of this
> patch, here's a list:
> 
> * StructTree::nestedTypes is generating a transitive closure of all
> nested types; this doesn't play well with recursive visitors, as it
> leads trees to be visited multiple times; I changed the code to expand
> only the innermost level of nested types (this affects the
> EmptyNameHandler visitor)
> 
> * Tree::equals/hashcode - this code is doing the comparison using
> Cursor::equals, which is good, but doesn't work at all across
> compilation units (e.g. when multiple headers are specified). For this
> reason I changed the comparison code to rely on source location info.
> 
> * TypedefHandler has some logic to exclude multiple declaration w/o
> definition, where the definition occurs elsewhere in the tree. But
> this logic doesn't visit structs recursively and, as a result, it
> fails to prune duplicates nested within structs.
> 
> P.S.
> A big thank to Sundar for all the help troubleshooting failing tests
> on Mac, and for updating some of the samples instruction in our
> markdown file :-)
> 
> Cheers
> Maurizio
> 
> [1] - 
> http://mail.openjdk.java.net/pipermail/panama-dev/2018-August/002564.html