[foreign] RFR 8220544: Jextract; library dependent header filtering

Jorn Vernee jbvernee at xs4all.nl
Thu Mar 14 16:08:59 UTC 2019


This produces test.class and test_h.class as expected, but nothing 
else...

Used command is: `jextract -L . -l test test.h`

You would get stuff from string.h when the -l is omitted.

Jorn

Maurizio Cimadamore schreef op 2019-03-14 17:00:
> I think you can reproduce simply by having an header like this:
> 
> #include <string.h>
> 
> void foo(int i);
> 
> 
> if you extract that you would expect one symbol, but in reality you
> get also all the stuff from string.h.
> 
> Maurizio
> 
> On 14/03/2019 15:49, Jorn Vernee wrote:
>> The Windows jextract artifact for the OpenGL sample is a different 
>> beast, because it relies on the standard Windows OpenGL libraries. So, 
>> seeing some system headers is to be expected :) In the case of python 
>> I'm also seeing some system headers, because python depends on a hand 
>> full of corecrt types, so that's also correct. But, neither one seems 
>> to be referencing imaxabs, so I'm not sure what's going on...
>> 
>> I'll try to see if I can find a repro example on Windows to see about 
>> a fix. I'm not sure I see how the  SymbolFilter could get to the VM 
>> libraries? Though I remember seeing a similar case in the past...
>> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>>> I think I got at the bottom of this. I tried debugging an extraction
>>> of opengl, and it eventually ends up adding inttypes.h to the rootset
>>> because of this symbol:
>>> 
>>> imaxabs
>>> 
>>> The library filter doesn't work here, because this is a function
>>> defined in the set of default libraries that comes with the VM, so
>>> jextract 'thinks' this symbol is genuinely part of one of the
>>> libraries specified in the command line - except is not.
>>> 
>>> I tried an hacky fix which checks for absence of symbol in
>>> Libraries.getDefaultLibrary - and that seems to give the expected
>>> results - but I leave this to your consideration.
>>> 
>>> Maurizio
>>> 
>>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>>> Hi Jorn,
>>>> I gave this a try with our OpenGL and Python examples. The 
>>>> extraction process works fine, and the generated jars are a bit 
>>>> smaller - but I still see a lot of system-related headers. Python 
>>>> has still lots of them, OpenGL less so, but even in the GL case, I 
>>>> think it's interesting, because I can't see references to types 
>>>> mentioned in these headers (nor in the resulting classfiles). Do you 
>>>> know, on top of your head, as to why this could be happening?
>>>> 
>>>> Maurizio
>>>> 
>>>> 
>>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>>> Updated webrev: 
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/
>>>>> 
>>>>> Changed Treemaker implementation to recurse to the 'root' type when 
>>>>> creating TypdefTree::def, and using that in DependencyFinder.
>>>>> 
>>>>> Jorn
>>>>> 
>>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>>> Oh ok, sorry. I'll try that out.
>>>>>> 
>>>>>> Jorn
>>>>>> 
>>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>>> I see what you mean.
>>>>>>>> 
>>>>>>>> Well, `struct BAR` has a definition cursor, but `struct BAR*` 
>>>>>>>> does not have a definition/declaration, it's a type that's 
>>>>>>>> automatically derived from `struct BAR`. So for `struct BAR*` 
>>>>>>>> there is no good value for TypedefTree::def.
>>>>>>> 
>>>>>>> I know that there's no good value for struct BAR* - I was 
>>>>>>> wondering if
>>>>>>> the parser should detect the pointer, and recurse down, then 
>>>>>>> store
>>>>>>> 'struct BAR' into 'def'.
>>>>>>> 
>>>>>>> Maurizio
>>>>>>> 
>>>>>>>> 
>>>>>>>> TypedefTree::def seems to be a helper method mostly targeted at 
>>>>>>>> inline type definitions (looking at TypedefHandler [1]).
>>>>>>>> 
>>>>>>>> I think for my use case using TypedefTree.type().canonicalType() 
>>>>>>>> is good enough, but I guess we could query that once in the 
>>>>>>>> parser and then store it in a field in TypedefTree?
>>>>>>>> 
>>>>>>>> Jorn
>>>>>>>> 
>>>>>>>> [1] : 
>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>> 
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>> 
>>>>>>>>>>      struct BAR { int x; }
>>>>>>>>>> 
>>>>>>>>>>      typedef struct BAR* FOO;
>>>>>>>>>> 
>>>>>>>>>> The TreeMaker code looks at the canonical type of the typedef 
>>>>>>>>>> `struct BAR*` which is neither a type definition or 
>>>>>>>>>> declaration, so TypedefTree::def is empty. But, in the case of 
>>>>>>>>>> dependecy analysis, we still want to inspect that type, get 
>>>>>>>>>> the pointee type and add it to the root set.
>>>>>>>>> 
>>>>>>>>> Right - and I think what I'm claiming is that this is probably 
>>>>>>>>> some
>>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should 
>>>>>>>>> probably
>>>>>>>>> find another name, 'def' is not good enough IMHO) is set 
>>>>>>>>> correctly for
>>>>>>>>> 
>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>> 
>>>>>>>>> it should also be set for
>>>>>>>>> 
>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>> 
>>>>>>>>> That is, I claim that from the typedef tree you should always 
>>>>>>>>> see a
>>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>> 
>>>>>>>>> Maurizio
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Jorn
>>>>>>>>>> 
>>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>>> So, if I understand correctly, the issue is that if you 
>>>>>>>>>>>> have:
>>>>>>>>>>>> 
>>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>> 
>>>>>>>>>>>> you want the struct to be added to the root set. Which make 
>>>>>>>>>>>> sense.
>>>>>>>>>>>> 
>>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at odds; 
>>>>>>>>>>>> it seems
>>>>>>>>>>>> to me that the jextract tree should have the right info, 
>>>>>>>>>>>> under
>>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>> 
>>>>>>>>>>> That exact case is covered by TypedefTree::def, but it 
>>>>>>>>>>> doesn't cover a
>>>>>>>>>>> case like this:
>>>>>>>>>>> 
>>>>>>>>>>>     struct BAR { int x; }
>>>>>>>>>>> 
>>>>>>>>>>>     typedef struct BAR FOO;
>>>>>>>>>>> 
>>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef contains 
>>>>>>>>>>> an inline
>>>>>>>>>>> definition or a prototype. But, if the type is just 
>>>>>>>>>>> referenced,
>>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>> 
>>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be enough 
>>>>>>>>>>>> (and more in
>>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the kind 
>>>>>>>>>>>> of thing
>>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>> 
>>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>> 
>>>>>>>>>>>> In other words, let's say that there's a typedef in an 
>>>>>>>>>>>> header of a
>>>>>>>>>>>> library you want to jextract which is related, via a chain 
>>>>>>>>>>>> of other 42
>>>>>>>>>>>> typedefs to something in some obscure system library. Are 
>>>>>>>>>>>> you sure you
>>>>>>>>>>>> want the resulting artifact to have all the 42 intermediate 
>>>>>>>>>>>> typedefs?
>>>>>>>>>>>> After all, the library you wanna extract has just defined 
>>>>>>>>>>>> 'its
>>>>>>>>>>>> version' of the name for that particular construct... so 
>>>>>>>>>>>> it's very
>>>>>>>>>>>> likely that users of that library will prefer that name to 
>>>>>>>>>>>> the other
>>>>>>>>>>>> 42 ones?
>>>>>>>>>>> 
>>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>> 
>>>>>>>>>>> But, I think you have a point. Putting the typedef and the 
>>>>>>>>>>> canonical
>>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>> 
>>>>>>>>>>> Jorn
>>>>>>>>>>> 
>>>>>>>>>>>> Maurizio
>>>>>>>>>>>> 
>>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>>> Update webrev: 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/ 
>>>>>>>>>>>>> This fixes an issue where typedef underlying types were not 
>>>>>>>>>>>>> being added to the root set.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It turns out I was using the wrong API to get the 
>>>>>>>>>>>>> underlying type, and the needed API didn't have a binding 
>>>>>>>>>>>>> yet :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>>   * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>>   * added TypdefTree.underlyingType() method
>>>>>>>>>>>>>   * changed DependencyFinder to use this method instead of 
>>>>>>>>>>>>> typeDefinition()
>>>>>>>>>>>>>   * added test case for the failure case (also simplified 
>>>>>>>>>>>>> test class to only run jextract once and reuse the result)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>> headers are only added to the root set if they contain a 
>>>>>>>>>>>>>>>> symbol that
>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both types 
>>>>>>>>>>>>>>> of filter are added to the root set. There's 1.) 
>>>>>>>>>>>>>>> Filtereing based on whether a symbol is found in a 
>>>>>>>>>>>>>>> library, and 2.) based on the used filter patterns 
>>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it 
>>>>>>>>>>>>>>> would be better to exclude a header from the root set if 
>>>>>>>>>>>>>>> all the symbols in it were filtered out by the filter 
>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with it a 
>>>>>>>>>>>>>>>>> bit before we
>>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also, one question: if the user does not specify "-l" 
>>>>>>>>>>>>>>>>> what happens? I
>>>>>>>>>>>>>>>>> guess the root set will then contain all the headers 
>>>>>>>>>>>>>>>>> and we just get
>>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>> headers are only added to the root set if they contain a 
>>>>>>>>>>>>>>>> symbol that
>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Maybe we should just fall back to 'include-everything' 
>>>>>>>>>>>>>>>> when no "-l"
>>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>> Updated webrev: 
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/ 
>>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the examples 
>>>>>>>>>>>>>>>>>> again to try and find other bugs (nothing found).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee and 
>>>>>>>>>>>>>>>>>>> array element type
>>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested a 
>>>>>>>>>>>>>>>>>>> working fix, but
>>>>>>>>>>>>>>>>>>> will also add some tests for the particular cases. 
>>>>>>>>>>>>>>>>>>> Updated webrev
>>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html 
>>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to header & 
>>>>>>>>>>>>>>>>>>>> declaration filtering.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Bug: 
>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/ 
>>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>>   * PattertnFilter; a helper class that encapsulates 
>>>>>>>>>>>>>>>>>>>> the Pattern based
>>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter. 
>>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>>   * Filters; a Context sub-component holding 
>>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>>   * DependencyFinder; finds dependencies of the root 
>>>>>>>>>>>>>>>>>>>> set of
>>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>>   * ElementFilter; filters out non-library symbol 
>>>>>>>>>>>>>>>>>>>> elements based on
>>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are required 
>>>>>>>>>>>>>>>>>>>> by something
>>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>>   * RootSet; a helper class for managing root 
>>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug printing 
>>>>>>>>>>>>>>>>>>>> Trees, which I've
>>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry API 
>>>>>>>>>>>>>>>>>>>> and it's working
>>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to 
>>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers and 
>>>>>>>>>>>>>>>>>>>> static forwarders
>>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be 
>>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Jorn


More information about the panama-dev mailing list