[foreign] RFR 8215502: Jextract output should be self-contained

Tue Dec 18 17:35:48 UTC 2018

Never mind, in that case actual file paths seem to be required, so the 
usage of File.separatorChar is correct.

I'll try and find out what's happening that's making the test fail.

Jorn

Jorn Vernee schreef op 2018-12-18 18:24:
> No, but running from a clean build yields the same result.
> 
> If this is only on my end I think Maurizio is right that this is a
> problem with platform/cygwin.
> 
> I ran into a problem with the Runner test as well, since it was using
> File.separatorChar to go from binary to internal class names and back.
> It looks like the new Writer class is doing this as well, but this
> gives incorrect behaviour on Windows, since the separator char there
> is '\', but afaik binary class names should always use '/'.
> 
> I'll try and see if fixing this will get the test to pass.
> 
> Jorn
> 
> Sundararajan Athijegannathan schreef op 2018-12-18 17:48:
>> Was that from a clean build?
>> 
>> -Sundar
>> 
>> On 18/12/18, 10:00 PM, Jorn Vernee wrote:
>>> Looks like com/sun/tools/jextract/missing/MissingSymbolTest.java is 
>>> failing after this patch?
>>> 
>>> It looks like it no longer generates the class that the test is 
>>> testing for: 
>>> http://cr.openjdk.java.net/~jvernee/panama/test-results/MissingSymbolTest.jtr
>>> 
>>> Jorn
>>> 
>>> Maurizio Cimadamore schreef op 2018-12-17 19:44:
>>>> Hi,
>>>> following external and internal feedback on jextract usability, we
>>>> have been working hard on a partial jextract rewrite which
>>>> significantly simplifies how jextract works and how the various 
>>>> pieces
>>>> are connected together. The idea is that each extraction run should
>>>> generate a self-contained artifact; this is _not_ what is happening
>>>> now, as jextract will only extract things that belong to the header
>>>> being extracted. This means that if the header depends on symbols
>>>> defined in some other included header, in a different folder, then 
>>>> the
>>>> resulting jar artifact will contain dandling refs and will fail to
>>>> bind. To overcome this issues user have in the past been forced to
>>>> 'chase headers' around, and add them to the jextract command line,
>>>> which is a very frustrating (and system-dependent) process.
>>>> 
>>>> Also, few months ago we embarked on a project to parse info obtained
>>>> from clang API into an internal representation (see [1]). But we've
>>>> only gone half-way with this - and the code is in a place where
>>>> sometimes the tree is used, sometimes the clang types are used, 
>>>> which
>>>> can be very unpredictable and hard to follow. This patch fixes all
>>>> that and makes trees the single source of truth; this means that 
>>>> some
>>>> issues in tree parsing had to be rectified, to avoid some common
>>>> issues (see end of this email for a list).
>>>> 
>>>> Here's the webrev:
>>>> 
>>>> https://cr.openjdk.java.net/~mcimadamore/panama/8215502/
>>>> 
>>>> (Not sure if it's better to look at the diffs or to just look at the 
>>>> new code)
>>>> 
>>>> As it can be seen, classes are much smaller now, each doing a fairly
>>>> simple, focussed task:
>>>> 
>>>> * Main: this is where we process command line args; we check them 
>>>> and
>>>> we set up a Context
>>>> 
>>>> * Context: this is where we hold the shared state - mostly holder 
>>>> for
>>>> command line args (after validation)
>>>> 
>>>> * JextractTool: this is the main jextract tool frontend - it has a
>>>> method called 'processHeaders' which does all the job. This work is
>>>> divided in the following steps:
>>>> 
>>>>    - parse all headers into trees
>>>>    - run a bunch of visitors onto the trees (symbol filter, typedef
>>>> handler, etc.)
>>>>    - extract all decls from header trees
>>>>    - get rid of duplicate decls
>>>>    - group decls by header file
>>>>    - for each headerfile
>>>>           + enter types into dictionary (TypeEnter)
>>>>           + generate bytecode using appropriate code factory
>>>>    - return a Writer object
>>>> 
>>>> * Writer: helper object used to write classes/jars
>>>> 
>>>> * HeaderResolver: helper object which keeps track of path/package
>>>> mappings, and which, given a path, returns the appropriate header 
>>>> file
>>>> object. Note: before this patch, all paths were flattened - e.g. if 
>>>> a
>>>> header was found under /usr/include/linux/foo.h, jextract would emit 
>>>> a
>>>> classfile called usr_include_foo.class. The new patch changes this
>>>> behavior and generates proper nested package names, so that the 
>>>> header
>>>> class will have now name "foo" and package "usr.include.linux".
>>>> 
>>>> * TypeDictionary: helper object which, with the help of the
>>>> HeaderResolver, allows to go from clang types to Java types
>>>> 
>>>> 
>>>> Overall, this feels like a big improvement in terms of how easy it 
>>>> is
>>>> to tweak the behavior of jextract - each of the main abstraction is
>>>> simple enough that it is relatively easy to tweak things to get a
>>>> different result, which is a huge step forward compared to where the
>>>> code is ATM (at least in my opinion, which could be biased of course
>>>> :-)). Following this change, we will probably do two followup 
>>>> changes:
>>>> 
>>>> I think the new logic is very deterministic, and easy to follow; 
>>>> there
>>>> are two items where this is still suboptimal:
>>>> 
>>>> 1) we allow multiple headerfiles as input, and this sometimes 
>>>> doesn't
>>>> play well with the parser; we have discussed many times how clang 
>>>> API
>>>> behaves better with single input which includes everything. It would
>>>> probably be better to restrict jextract to work on a single header
>>>> input file only (which, thanks to the new resolution
>>>> logic/self-contained model, now works all the times).
>>>> 
>>>> 2) related to the above, as we discussed, the -t option doesn't make
>>>> a  lot of sense when multiple input headers are involved. I think it
>>>> would be best again to restrict input to be just one headerfile (as
>>>> per 1), and also transform -t into a proper 'package prefix' option
>>>> instead, which adds given prefix to all package names generated by
>>>> jextract. This is very useful to package up self-contained artifacts
>>>> in way that do no clash when loaded by same classloader.
>>>> 
>>>> As for the issues in tree parsing that were addressed as part of 
>>>> this
>>>> patch, here's a list:
>>>> 
>>>> * StructTree::nestedTypes is generating a transitive closure of all
>>>> nested types; this doesn't play well with recursive visitors, as it
>>>> leads trees to be visited multiple times; I changed the code to 
>>>> expand
>>>> only the innermost level of nested types (this affects the
>>>> EmptyNameHandler visitor)
>>>> 
>>>> * Tree::equals/hashcode - this code is doing the comparison using
>>>> Cursor::equals, which is good, but doesn't work at all across
>>>> compilation units (e.g. when multiple headers are specified). For 
>>>> this
>>>> reason I changed the comparison code to rely on source location 
>>>> info.
>>>> 
>>>> * TypedefHandler has some logic to exclude multiple declaration w/o
>>>> definition, where the definition occurs elsewhere in the tree. But
>>>> this logic doesn't visit structs recursively and, as a result, it
>>>> fails to prune duplicates nested within structs.
>>>> 
>>>> P.S.
>>>> A big thank to Sundar for all the help troubleshooting failing tests
>>>> on Mac, and for updating some of the samples instruction in our
>>>> markdown file :-)
>>>> 
>>>> Cheers
>>>> Maurizio
>>>> 
>>>> [1] - 
>>>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-August/002564.html