[foreign] RFR 8220544: Jextract; library dependent header filtering

Jorn Vernee jbvernee at xs4all.nl
Thu Mar 14 16:16:54 UTC 2019


Sorry, that's with the v3 patch 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.03/

I reverted back after the concerns raised about changing the 
implementation of TypedefTree::def You're right in that this incorrectly 
includes some trees.

Sorry,
Jorn

Jorn Vernee schreef op 2019-03-14 17:08:
> This produces test.class and test_h.class as expected, but nothing 
> else...
> 
> Used command is: `jextract -L . -l test test.h`
> 
> You would get stuff from string.h when the -l is omitted.
> 
> Jorn
> 
> Maurizio Cimadamore schreef op 2019-03-14 17:00:
>> I think you can reproduce simply by having an header like this:
>> 
>> #include <string.h>
>> 
>> void foo(int i);
>> 
>> 
>> if you extract that you would expect one symbol, but in reality you
>> get also all the stuff from string.h.
>> 
>> Maurizio
>> 
>> On 14/03/2019 15:49, Jorn Vernee wrote:
>>> The Windows jextract artifact for the OpenGL sample is a different 
>>> beast, because it relies on the standard Windows OpenGL libraries. 
>>> So, seeing some system headers is to be expected :) In the case of 
>>> python I'm also seeing some system headers, because python depends on 
>>> a hand full of corecrt types, so that's also correct. But, neither 
>>> one seems to be referencing imaxabs, so I'm not sure what's going 
>>> on...
>>> 
>>> I'll try to see if I can find a repro example on Windows to see about 
>>> a fix. I'm not sure I see how the  SymbolFilter could get to the VM 
>>> libraries? Though I remember seeing a similar case in the past...
>>> 
>>> Jorn
>>> 
>>> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>>>> I think I got at the bottom of this. I tried debugging an extraction
>>>> of opengl, and it eventually ends up adding inttypes.h to the 
>>>> rootset
>>>> because of this symbol:
>>>> 
>>>> imaxabs
>>>> 
>>>> The library filter doesn't work here, because this is a function
>>>> defined in the set of default libraries that comes with the VM, so
>>>> jextract 'thinks' this symbol is genuinely part of one of the
>>>> libraries specified in the command line - except is not.
>>>> 
>>>> I tried an hacky fix which checks for absence of symbol in
>>>> Libraries.getDefaultLibrary - and that seems to give the expected
>>>> results - but I leave this to your consideration.
>>>> 
>>>> Maurizio
>>>> 
>>>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>>>> Hi Jorn,
>>>>> I gave this a try with our OpenGL and Python examples. The 
>>>>> extraction process works fine, and the generated jars are a bit 
>>>>> smaller - but I still see a lot of system-related headers. Python 
>>>>> has still lots of them, OpenGL less so, but even in the GL case, I 
>>>>> think it's interesting, because I can't see references to types 
>>>>> mentioned in these headers (nor in the resulting classfiles). Do 
>>>>> you know, on top of your head, as to why this could be happening?
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>> 
>>>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>>>> Updated webrev: 
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/
>>>>>> 
>>>>>> Changed Treemaker implementation to recurse to the 'root' type 
>>>>>> when creating TypdefTree::def, and using that in DependencyFinder.
>>>>>> 
>>>>>> Jorn
>>>>>> 
>>>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>>>> Oh ok, sorry. I'll try that out.
>>>>>>> 
>>>>>>> Jorn
>>>>>>> 
>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>>>> I see what you mean.
>>>>>>>>> 
>>>>>>>>> Well, `struct BAR` has a definition cursor, but `struct BAR*` 
>>>>>>>>> does not have a definition/declaration, it's a type that's 
>>>>>>>>> automatically derived from `struct BAR`. So for `struct BAR*` 
>>>>>>>>> there is no good value for TypedefTree::def.
>>>>>>>> 
>>>>>>>> I know that there's no good value for struct BAR* - I was 
>>>>>>>> wondering if
>>>>>>>> the parser should detect the pointer, and recurse down, then 
>>>>>>>> store
>>>>>>>> 'struct BAR' into 'def'.
>>>>>>>> 
>>>>>>>> Maurizio
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> TypedefTree::def seems to be a helper method mostly targeted at 
>>>>>>>>> inline type definitions (looking at TypedefHandler [1]).
>>>>>>>>> 
>>>>>>>>> I think for my use case using 
>>>>>>>>> TypedefTree.type().canonicalType() is good enough, but I guess 
>>>>>>>>> we could query that once in the parser and then store it in a 
>>>>>>>>> field in TypedefTree?
>>>>>>>>> 
>>>>>>>>> Jorn
>>>>>>>>> 
>>>>>>>>> [1] : 
>>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>>> 
>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>>> 
>>>>>>>>>>>      struct BAR { int x; }
>>>>>>>>>>> 
>>>>>>>>>>>      typedef struct BAR* FOO;
>>>>>>>>>>> 
>>>>>>>>>>> The TreeMaker code looks at the canonical type of the typedef 
>>>>>>>>>>> `struct BAR*` which is neither a type definition or 
>>>>>>>>>>> declaration, so TypedefTree::def is empty. But, in the case 
>>>>>>>>>>> of dependecy analysis, we still want to inspect that type, 
>>>>>>>>>>> get the pointee type and add it to the root set.
>>>>>>>>>> 
>>>>>>>>>> Right - and I think what I'm claiming is that this is probably 
>>>>>>>>>> some
>>>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should 
>>>>>>>>>> probably
>>>>>>>>>> find another name, 'def' is not good enough IMHO) is set 
>>>>>>>>>> correctly for
>>>>>>>>>> 
>>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>>> 
>>>>>>>>>> it should also be set for
>>>>>>>>>> 
>>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>>> 
>>>>>>>>>> That is, I claim that from the typedef tree you should always 
>>>>>>>>>> see a
>>>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>>> 
>>>>>>>>>> Maurizio
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Jorn
>>>>>>>>>>> 
>>>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>>>> So, if I understand correctly, the issue is that if you 
>>>>>>>>>>>>> have:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> you want the struct to be added to the root set. Which make 
>>>>>>>>>>>>> sense.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at odds; 
>>>>>>>>>>>>> it seems
>>>>>>>>>>>>> to me that the jextract tree should have the right info, 
>>>>>>>>>>>>> under
>>>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>>> 
>>>>>>>>>>>> That exact case is covered by TypedefTree::def, but it 
>>>>>>>>>>>> doesn't cover a
>>>>>>>>>>>> case like this:
>>>>>>>>>>>> 
>>>>>>>>>>>>     struct BAR { int x; }
>>>>>>>>>>>> 
>>>>>>>>>>>>     typedef struct BAR FOO;
>>>>>>>>>>>> 
>>>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef contains 
>>>>>>>>>>>> an inline
>>>>>>>>>>>> definition or a prototype. But, if the type is just 
>>>>>>>>>>>> referenced,
>>>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be enough 
>>>>>>>>>>>>> (and more in
>>>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the 
>>>>>>>>>>>>> kind of thing
>>>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>>> 
>>>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>>> 
>>>>>>>>>>>>> In other words, let's say that there's a typedef in an 
>>>>>>>>>>>>> header of a
>>>>>>>>>>>>> library you want to jextract which is related, via a chain 
>>>>>>>>>>>>> of other 42
>>>>>>>>>>>>> typedefs to something in some obscure system library. Are 
>>>>>>>>>>>>> you sure you
>>>>>>>>>>>>> want the resulting artifact to have all the 42 intermediate 
>>>>>>>>>>>>> typedefs?
>>>>>>>>>>>>> After all, the library you wanna extract has just defined 
>>>>>>>>>>>>> 'its
>>>>>>>>>>>>> version' of the name for that particular construct... so 
>>>>>>>>>>>>> it's very
>>>>>>>>>>>>> likely that users of that library will prefer that name to 
>>>>>>>>>>>>> the other
>>>>>>>>>>>>> 42 ones?
>>>>>>>>>>>> 
>>>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>>> 
>>>>>>>>>>>> But, I think you have a point. Putting the typedef and the 
>>>>>>>>>>>> canonical
>>>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jorn
>>>>>>>>>>>> 
>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>>>> Update webrev: 
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/ 
>>>>>>>>>>>>>> This fixes an issue where typedef underlying types were 
>>>>>>>>>>>>>> not being added to the root set.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It turns out I was using the wrong API to get the 
>>>>>>>>>>>>>> underlying type, and the needed API didn't have a binding 
>>>>>>>>>>>>>> yet :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>>>   * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>>>   * added TypdefTree.underlyingType() method
>>>>>>>>>>>>>>   * changed DependencyFinder to use this method instead of 
>>>>>>>>>>>>>> typeDefinition()
>>>>>>>>>>>>>>   * added test case for the failure case (also simplified 
>>>>>>>>>>>>>> test class to only run jextract once and reuse the result)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>> headers are only added to the root set if they contain 
>>>>>>>>>>>>>>>>> a symbol that
>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both 
>>>>>>>>>>>>>>>> types of filter are added to the root set. There's 1.) 
>>>>>>>>>>>>>>>> Filtereing based on whether a symbol is found in a 
>>>>>>>>>>>>>>>> library, and 2.) based on the used filter patterns 
>>>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it 
>>>>>>>>>>>>>>>> would be better to exclude a header from the root set if 
>>>>>>>>>>>>>>>> all the symbols in it were filtered out by the filter 
>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with it 
>>>>>>>>>>>>>>>>>> a bit before we
>>>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Also, one question: if the user does not specify "-l" 
>>>>>>>>>>>>>>>>>> what happens? I
>>>>>>>>>>>>>>>>>> guess the root set will then contain all the headers 
>>>>>>>>>>>>>>>>>> and we just get
>>>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>> headers are only added to the root set if they contain 
>>>>>>>>>>>>>>>>> a symbol that
>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Maybe we should just fall back to 'include-everything' 
>>>>>>>>>>>>>>>>> when no "-l"
>>>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>> Updated webrev: 
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/ 
>>>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the 
>>>>>>>>>>>>>>>>>>> examples again to try and find other bugs (nothing 
>>>>>>>>>>>>>>>>>>> found).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee and 
>>>>>>>>>>>>>>>>>>>> array element type
>>>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested a 
>>>>>>>>>>>>>>>>>>>> working fix, but
>>>>>>>>>>>>>>>>>>>> will also add some tests for the particular cases. 
>>>>>>>>>>>>>>>>>>>> Updated webrev
>>>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html 
>>>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to header & 
>>>>>>>>>>>>>>>>>>>>> declaration filtering.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Bug: 
>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/ 
>>>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>>>   * PattertnFilter; a helper class that 
>>>>>>>>>>>>>>>>>>>>> encapsulates the Pattern based
>>>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter. 
>>>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>>>   * Filters; a Context sub-component holding 
>>>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>>>   * DependencyFinder; finds dependencies of the 
>>>>>>>>>>>>>>>>>>>>> root set of
>>>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>>>   * ElementFilter; filters out non-library symbol 
>>>>>>>>>>>>>>>>>>>>> elements based on
>>>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are 
>>>>>>>>>>>>>>>>>>>>> required by something
>>>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>>>   * RootSet; a helper class for managing root 
>>>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug printing 
>>>>>>>>>>>>>>>>>>>>> Trees, which I've
>>>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry 
>>>>>>>>>>>>>>>>>>>>> API and it's working
>>>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to 
>>>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers 
>>>>>>>>>>>>>>>>>>>>> and static forwarders
>>>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be 
>>>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Jorn


More information about the panama-dev mailing list