[foreign] RFR 8220544: Jextract; library dependent header filtering

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Mar 14 16:00:38 UTC 2019


I think you can reproduce simply by having an header like this:

#include <string.h>

void foo(int i);


if you extract that you would expect one symbol, but in reality you get 
also all the stuff from string.h.

Maurizio

On 14/03/2019 15:49, Jorn Vernee wrote:
> The Windows jextract artifact for the OpenGL sample is a different 
> beast, because it relies on the standard Windows OpenGL libraries. So, 
> seeing some system headers is to be expected :) In the case of python 
> I'm also seeing some system headers, because python depends on a hand 
> full of corecrt types, so that's also correct. But, neither one seems 
> to be referencing imaxabs, so I'm not sure what's going on...
>
> I'll try to see if I can find a repro example on Windows to see about 
> a fix. I'm not sure I see how the  SymbolFilter could get to the VM 
> libraries? Though I remember seeing a similar case in the past...
>
> Jorn
>
> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>> I think I got at the bottom of this. I tried debugging an extraction
>> of opengl, and it eventually ends up adding inttypes.h to the rootset
>> because of this symbol:
>>
>> imaxabs
>>
>> The library filter doesn't work here, because this is a function
>> defined in the set of default libraries that comes with the VM, so
>> jextract 'thinks' this symbol is genuinely part of one of the
>> libraries specified in the command line - except is not.
>>
>> I tried an hacky fix which checks for absence of symbol in
>> Libraries.getDefaultLibrary - and that seems to give the expected
>> results - but I leave this to your consideration.
>>
>> Maurizio
>>
>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>> Hi Jorn,
>>> I gave this a try with our OpenGL and Python examples. The 
>>> extraction process works fine, and the generated jars are a bit 
>>> smaller - but I still see a lot of system-related headers. Python 
>>> has still lots of them, OpenGL less so, but even in the GL case, I 
>>> think it's interesting, because I can't see references to types 
>>> mentioned in these headers (nor in the resulting classfiles). Do you 
>>> know, on top of your head, as to why this could be happening?
>>>
>>> Maurizio
>>>
>>>
>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>> Updated webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/
>>>>
>>>> Changed Treemaker implementation to recurse to the 'root' type when 
>>>> creating TypdefTree::def, and using that in DependencyFinder.
>>>>
>>>> Jorn
>>>>
>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>> Oh ok, sorry. I'll try that out.
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>> I see what you mean.
>>>>>>>
>>>>>>> Well, `struct BAR` has a definition cursor, but `struct BAR*` 
>>>>>>> does not have a definition/declaration, it's a type that's 
>>>>>>> automatically derived from `struct BAR`. So for `struct BAR*` 
>>>>>>> there is no good value for TypedefTree::def.
>>>>>>
>>>>>> I know that there's no good value for struct BAR* - I was 
>>>>>> wondering if
>>>>>> the parser should detect the pointer, and recurse down, then store
>>>>>> 'struct BAR' into 'def'.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>>
>>>>>>> TypedefTree::def seems to be a helper method mostly targeted at 
>>>>>>> inline type definitions (looking at TypedefHandler [1]).
>>>>>>>
>>>>>>> I think for my use case using TypedefTree.type().canonicalType() 
>>>>>>> is good enough, but I guess we could query that once in the 
>>>>>>> parser and then store it in a field in TypedefTree?
>>>>>>>
>>>>>>> Jorn
>>>>>>>
>>>>>>> [1] : 
>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>
>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>
>>>>>>>>>      struct BAR { int x; }
>>>>>>>>>
>>>>>>>>>      typedef struct BAR* FOO;
>>>>>>>>>
>>>>>>>>> The TreeMaker code looks at the canonical type of the typedef 
>>>>>>>>> `struct BAR*` which is neither a type definition or 
>>>>>>>>> declaration, so TypedefTree::def is empty. But, in the case of 
>>>>>>>>> dependecy analysis, we still want to inspect that type, get 
>>>>>>>>> the pointee type and add it to the root set.
>>>>>>>>
>>>>>>>> Right - and I think what I'm claiming is that this is probably 
>>>>>>>> some
>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should probably
>>>>>>>> find another name, 'def' is not good enough IMHO) is set 
>>>>>>>> correctly for
>>>>>>>>
>>>>>>>> typedef struct BAR FOO;
>>>>>>>>
>>>>>>>> it should also be set for
>>>>>>>>
>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>
>>>>>>>> That is, I claim that from the typedef tree you should always 
>>>>>>>> see a
>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>> So, if I understand correctly, the issue is that if you have:
>>>>>>>>>>>
>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>
>>>>>>>>>>> you want the struct to be added to the root set. Which make 
>>>>>>>>>>> sense.
>>>>>>>>>>>
>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at odds; 
>>>>>>>>>>> it seems
>>>>>>>>>>> to me that the jextract tree should have the right info, under
>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>
>>>>>>>>>> That exact case is covered by TypedefTree::def, but it 
>>>>>>>>>> doesn't cover a
>>>>>>>>>> case like this:
>>>>>>>>>>
>>>>>>>>>>     struct BAR { int x; }
>>>>>>>>>>
>>>>>>>>>>     typedef struct BAR FOO;
>>>>>>>>>>
>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef contains 
>>>>>>>>>> an inline
>>>>>>>>>> definition or a prototype. But, if the type is just referenced,
>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>
>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be enough 
>>>>>>>>>>> (and more in
>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the kind 
>>>>>>>>>>> of thing
>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>
>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>
>>>>>>>>>>> In other words, let's say that there's a typedef in an 
>>>>>>>>>>> header of a
>>>>>>>>>>> library you want to jextract which is related, via a chain 
>>>>>>>>>>> of other 42
>>>>>>>>>>> typedefs to something in some obscure system library. Are 
>>>>>>>>>>> you sure you
>>>>>>>>>>> want the resulting artifact to have all the 42 intermediate 
>>>>>>>>>>> typedefs?
>>>>>>>>>>> After all, the library you wanna extract has just defined 'its
>>>>>>>>>>> version' of the name for that particular construct... so 
>>>>>>>>>>> it's very
>>>>>>>>>>> likely that users of that library will prefer that name to 
>>>>>>>>>>> the other
>>>>>>>>>>> 42 ones?
>>>>>>>>>>
>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>
>>>>>>>>>> But, I think you have a point. Putting the typedef and the 
>>>>>>>>>> canonical
>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>>> Maurizio
>>>>>>>>>>>
>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>> Update webrev: 
>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/ 
>>>>>>>>>>>> This fixes an issue where typedef underlying types were not 
>>>>>>>>>>>> being added to the root set.
>>>>>>>>>>>>
>>>>>>>>>>>> It turns out I was using the wrong API to get the 
>>>>>>>>>>>> underlying type, and the needed API didn't have a binding 
>>>>>>>>>>>> yet :)
>>>>>>>>>>>>
>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>   * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>   * added TypdefTree.underlyingType() method
>>>>>>>>>>>>   * changed DependencyFinder to use this method instead of 
>>>>>>>>>>>> typeDefinition()
>>>>>>>>>>>>   * added test case for the failure case (also simplified 
>>>>>>>>>>>> test class to only run jextract once and reuse the result)
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>> headers are only added to the root set if they contain a 
>>>>>>>>>>>>>>> symbol that
>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both types 
>>>>>>>>>>>>>> of filter are added to the root set. There's 1.) 
>>>>>>>>>>>>>> Filtereing based on whether a symbol is found in a 
>>>>>>>>>>>>>> library, and 2.) based on the used filter patterns 
>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it 
>>>>>>>>>>>>>> would be better to exclude a header from the root set if 
>>>>>>>>>>>>>> all the symbols in it were filtered out by the filter 
>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with it a 
>>>>>>>>>>>>>>>> bit before we
>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, one question: if the user does not specify "-l" 
>>>>>>>>>>>>>>>> what happens? I
>>>>>>>>>>>>>>>> guess the root set will then contain all the headers 
>>>>>>>>>>>>>>>> and we just get
>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>> headers are only added to the root set if they contain a 
>>>>>>>>>>>>>>> symbol that
>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe we should just fall back to 'include-everything' 
>>>>>>>>>>>>>>> when no "-l"
>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>> Updated webrev: 
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/ 
>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the examples 
>>>>>>>>>>>>>>>>> again to try and find other bugs (nothing found).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee and 
>>>>>>>>>>>>>>>>>> array element type
>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested a 
>>>>>>>>>>>>>>>>>> working fix, but
>>>>>>>>>>>>>>>>>> will also add some tests for the particular cases. 
>>>>>>>>>>>>>>>>>> Updated webrev
>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html 
>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to header & 
>>>>>>>>>>>>>>>>>>> declaration filtering.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/ 
>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>   * PattertnFilter; a helper class that encapsulates 
>>>>>>>>>>>>>>>>>>> the Pattern based
>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter. 
>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>   * Filters; a Context sub-component holding 
>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>   * DependencyFinder; finds dependencies of the root 
>>>>>>>>>>>>>>>>>>> set of
>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>   * ElementFilter; filters out non-library symbol 
>>>>>>>>>>>>>>>>>>> elements based on
>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are required 
>>>>>>>>>>>>>>>>>>> by something
>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>   * RootSet; a helper class for managing root 
>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug printing 
>>>>>>>>>>>>>>>>>>> Trees, which I've
>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry API 
>>>>>>>>>>>>>>>>>>> and it's working
>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to 
>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers and 
>>>>>>>>>>>>>>>>>>> static forwarders
>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be 
>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Jorn


More information about the panama-dev mailing list