[foreign] RFR 8215502: Jextract output should be self-contained

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Dec 18 11:55:46 UTC 2018


Thanks, fixed (locally) and pushed.

Maurizio

On 18/12/2018 10:55, Sundararajan Athijegannathan wrote:
> As I've seen this (and previous variants of) changeset already, +1 
> from me.
>
> Minor:
>
> forwardRef.h misses copyright.
>
> -Sundar
>
> On 18/12/18, 12:14 AM, Maurizio Cimadamore wrote:
>> Hi,
>> following external and internal feedback on jextract usability, we 
>> have been working hard on a partial jextract rewrite which 
>> significantly simplifies how jextract works and how the various 
>> pieces are connected together. The idea is that each extraction run 
>> should generate a self-contained artifact; this is _not_ what is 
>> happening now, as jextract will only extract things that belong to 
>> the header being extracted. This means that if the header depends on 
>> symbols defined in some other included header, in a different folder, 
>> then the resulting jar artifact will contain dandling refs and will 
>> fail to bind. To overcome this issues user have in the past been 
>> forced to 'chase headers' around, and add them to the jextract 
>> command line, which is a very frustrating (and system-dependent) 
>> process.
>>
>> Also, few months ago we embarked on a project to parse info obtained 
>> from clang API into an internal representation (see [1]). But we've 
>> only gone half-way with this - and the code is in a place where 
>> sometimes the tree is used, sometimes the clang types are used, which 
>> can be very unpredictable and hard to follow. This patch fixes all 
>> that and makes trees the single source of truth; this means that some 
>> issues in tree parsing had to be rectified, to avoid some common 
>> issues (see end of this email for a list).
>>
>> Here's the webrev:
>>
>> https://cr.openjdk.java.net/~mcimadamore/panama/8215502/
>>
>> (Not sure if it's better to look at the diffs or to just look at the 
>> new code)
>>
>> As it can be seen, classes are much smaller now, each doing a fairly 
>> simple, focussed task:
>>
>> * Main: this is where we process command line args; we check them and 
>> we set up a Context
>>
>> * Context: this is where we hold the shared state - mostly holder for 
>> command line args (after validation)
>>
>> * JextractTool: this is the main jextract tool frontend - it has a 
>> method called 'processHeaders' which does all the job. This work is 
>> divided in the following steps:
>>
>>    - parse all headers into trees
>>    - run a bunch of visitors onto the trees (symbol filter, typedef 
>> handler, etc.)
>>    - extract all decls from header trees
>>    - get rid of duplicate decls
>>    - group decls by header file
>>    - for each headerfile
>>           + enter types into dictionary (TypeEnter)
>>           + generate bytecode using appropriate code factory
>>    - return a Writer object
>>
>> * Writer: helper object used to write classes/jars
>>
>> * HeaderResolver: helper object which keeps track of path/package 
>> mappings, and which, given a path, returns the appropriate header 
>> file object. Note: before this patch, all paths were flattened - e.g. 
>> if a header was found under /usr/include/linux/foo.h, jextract would 
>> emit a classfile called usr_include_foo.class. The new patch changes 
>> this behavior and generates proper nested package names, so that the 
>> header class will have now name "foo" and package "usr.include.linux".
>>
>> * TypeDictionary: helper object which, with the help of the 
>> HeaderResolver, allows to go from clang types to Java types
>>
>>
>> Overall, this feels like a big improvement in terms of how easy it is 
>> to tweak the behavior of jextract - each of the main abstraction is 
>> simple enough that it is relatively easy to tweak things to get a 
>> different result, which is a huge step forward compared to where the 
>> code is ATM (at least in my opinion, which could be biased of course 
>> :-)). Following this change, we will probably do two followup changes:
>>
>> I think the new logic is very deterministic, and easy to follow; 
>> there are two items where this is still suboptimal:
>>
>> 1) we allow multiple headerfiles as input, and this sometimes doesn't 
>> play well with the parser; we have discussed many times how clang API 
>> behaves better with single input which includes everything. It would 
>> probably be better to restrict jextract to work on a single header 
>> input file only (which, thanks to the new resolution 
>> logic/self-contained model, now works all the times).
>>
>> 2) related to the above, as we discussed, the -t option doesn't make 
>> a  lot of sense when multiple input headers are involved. I think it 
>> would be best again to restrict input to be just one headerfile (as 
>> per 1), and also transform -t into a proper 'package prefix' option 
>> instead, which adds given prefix to all package names generated by 
>> jextract. This is very useful to package up self-contained artifacts 
>> in way that do no clash when loaded by same classloader.
>>
>> As for the issues in tree parsing that were addressed as part of this 
>> patch, here's a list:
>>
>> * StructTree::nestedTypes is generating a transitive closure of all 
>> nested types; this doesn't play well with recursive visitors, as it 
>> leads trees to be visited multiple times; I changed the code to 
>> expand only the innermost level of nested types (this affects the 
>> EmptyNameHandler visitor)
>>
>> * Tree::equals/hashcode - this code is doing the comparison using 
>> Cursor::equals, which is good, but doesn't work at all across 
>> compilation units (e.g. when multiple headers are specified). For 
>> this reason I changed the comparison code to rely on source location 
>> info.
>>
>> * TypedefHandler has some logic to exclude multiple declaration w/o 
>> definition, where the definition occurs elsewhere in the tree. But 
>> this logic doesn't visit structs recursively and, as a result, it 
>> fails to prune duplicates nested within structs.
>>
>> P.S.
>> A big thank to Sundar for all the help troubleshooting failing tests 
>> on Mac, and for updating some of the samples instruction in our 
>> markdown file :-)
>>
>> Cheers
>> Maurizio
>>
>> [1] - 
>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-August/002564.html
>>


More information about the panama-dev mailing list