[foreign] RFR 8215502: Jextract output should be self-contained
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Dec 17 18:44:36 UTC 2018
Hi,
following external and internal feedback on jextract usability, we have
been working hard on a partial jextract rewrite which significantly
simplifies how jextract works and how the various pieces are connected
together. The idea is that each extraction run should generate a
self-contained artifact; this is _not_ what is happening now, as
jextract will only extract things that belong to the header being
extracted. This means that if the header depends on symbols defined in
some other included header, in a different folder, then the resulting
jar artifact will contain dandling refs and will fail to bind. To
overcome this issues user have in the past been forced to 'chase
headers' around, and add them to the jextract command line, which is a
very frustrating (and system-dependent) process.
Also, few months ago we embarked on a project to parse info obtained
from clang API into an internal representation (see [1]). But we've only
gone half-way with this - and the code is in a place where sometimes the
tree is used, sometimes the clang types are used, which can be very
unpredictable and hard to follow. This patch fixes all that and makes
trees the single source of truth; this means that some issues in tree
parsing had to be rectified, to avoid some common issues (see end of
this email for a list).
Here's the webrev:
https://cr.openjdk.java.net/~mcimadamore/panama/8215502/
(Not sure if it's better to look at the diffs or to just look at the new
code)
As it can be seen, classes are much smaller now, each doing a fairly
simple, focussed task:
* Main: this is where we process command line args; we check them and we
set up a Context
* Context: this is where we hold the shared state - mostly holder for
command line args (after validation)
* JextractTool: this is the main jextract tool frontend - it has a
method called 'processHeaders' which does all the job. This work is
divided in the following steps:
- parse all headers into trees
- run a bunch of visitors onto the trees (symbol filter, typedef
handler, etc.)
- extract all decls from header trees
- get rid of duplicate decls
- group decls by header file
- for each headerfile
+ enter types into dictionary (TypeEnter)
+ generate bytecode using appropriate code factory
- return a Writer object
* Writer: helper object used to write classes/jars
* HeaderResolver: helper object which keeps track of path/package
mappings, and which, given a path, returns the appropriate header file
object. Note: before this patch, all paths were flattened - e.g. if a
header was found under /usr/include/linux/foo.h, jextract would emit a
classfile called usr_include_foo.class. The new patch changes this
behavior and generates proper nested package names, so that the header
class will have now name "foo" and package "usr.include.linux".
* TypeDictionary: helper object which, with the help of the
HeaderResolver, allows to go from clang types to Java types
Overall, this feels like a big improvement in terms of how easy it is to
tweak the behavior of jextract - each of the main abstraction is simple
enough that it is relatively easy to tweak things to get a different
result, which is a huge step forward compared to where the code is ATM
(at least in my opinion, which could be biased of course :-)). Following
this change, we will probably do two followup changes:
I think the new logic is very deterministic, and easy to follow; there
are two items where this is still suboptimal:
1) we allow multiple headerfiles as input, and this sometimes doesn't
play well with the parser; we have discussed many times how clang API
behaves better with single input which includes everything. It would
probably be better to restrict jextract to work on a single header input
file only (which, thanks to the new resolution logic/self-contained
model, now works all the times).
2) related to the above, as we discussed, the -t option doesn't make a
lot of sense when multiple input headers are involved. I think it would
be best again to restrict input to be just one headerfile (as per 1),
and also transform -t into a proper 'package prefix' option instead,
which adds given prefix to all package names generated by jextract. This
is very useful to package up self-contained artifacts in way that do no
clash when loaded by same classloader.
As for the issues in tree parsing that were addressed as part of this
patch, here's a list:
* StructTree::nestedTypes is generating a transitive closure of all
nested types; this doesn't play well with recursive visitors, as it
leads trees to be visited multiple times; I changed the code to expand
only the innermost level of nested types (this affects the
EmptyNameHandler visitor)
* Tree::equals/hashcode - this code is doing the comparison using
Cursor::equals, which is good, but doesn't work at all across
compilation units (e.g. when multiple headers are specified). For this
reason I changed the comparison code to rely on source location info.
* TypedefHandler has some logic to exclude multiple declaration w/o
definition, where the definition occurs elsewhere in the tree. But this
logic doesn't visit structs recursively and, as a result, it fails to
prune duplicates nested within structs.
P.S.
A big thank to Sundar for all the help troubleshooting failing tests on
Mac, and for updating some of the samples instruction in our markdown
file :-)
Cheers
Maurizio
[1] -
http://mail.openjdk.java.net/pipermail/panama-dev/2018-August/002564.html
More information about the panama-dev
mailing list