[foreign] RFR 8220544: Jextract; library dependent header filtering

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Mar 14 16:38:16 UTC 2019


Ah ok

will give that another try

Maurizio

On 14/03/2019 16:16, Jorn Vernee wrote:
> Sorry, that's with the v3 patch 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.03/
>
> I reverted back after the concerns raised about changing the 
> implementation of TypedefTree::def You're right in that this 
> incorrectly includes some trees.
>
> Sorry,
> Jorn
>
> Jorn Vernee schreef op 2019-03-14 17:08:
>> This produces test.class and test_h.class as expected, but nothing 
>> else...
>>
>> Used command is: `jextract -L . -l test test.h`
>>
>> You would get stuff from string.h when the -l is omitted.
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-03-14 17:00:
>>> I think you can reproduce simply by having an header like this:
>>>
>>> #include <string.h>
>>>
>>> void foo(int i);
>>>
>>>
>>> if you extract that you would expect one symbol, but in reality you
>>> get also all the stuff from string.h.
>>>
>>> Maurizio
>>>
>>> On 14/03/2019 15:49, Jorn Vernee wrote:
>>>> The Windows jextract artifact for the OpenGL sample is a different 
>>>> beast, because it relies on the standard Windows OpenGL libraries. 
>>>> So, seeing some system headers is to be expected :) In the case of 
>>>> python I'm also seeing some system headers, because python depends 
>>>> on a hand full of corecrt types, so that's also correct. But, 
>>>> neither one seems to be referencing imaxabs, so I'm not sure what's 
>>>> going on...
>>>>
>>>> I'll try to see if I can find a repro example on Windows to see 
>>>> about a fix. I'm not sure I see how the  SymbolFilter could get to 
>>>> the VM libraries? Though I remember seeing a similar case in the 
>>>> past...
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>>>>> I think I got at the bottom of this. I tried debugging an extraction
>>>>> of opengl, and it eventually ends up adding inttypes.h to the rootset
>>>>> because of this symbol:
>>>>>
>>>>> imaxabs
>>>>>
>>>>> The library filter doesn't work here, because this is a function
>>>>> defined in the set of default libraries that comes with the VM, so
>>>>> jextract 'thinks' this symbol is genuinely part of one of the
>>>>> libraries specified in the command line - except is not.
>>>>>
>>>>> I tried an hacky fix which checks for absence of symbol in
>>>>> Libraries.getDefaultLibrary - and that seems to give the expected
>>>>> results - but I leave this to your consideration.
>>>>>
>>>>> Maurizio
>>>>>
>>>>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>>>>> Hi Jorn,
>>>>>> I gave this a try with our OpenGL and Python examples. The 
>>>>>> extraction process works fine, and the generated jars are a bit 
>>>>>> smaller - but I still see a lot of system-related headers. Python 
>>>>>> has still lots of them, OpenGL less so, but even in the GL case, 
>>>>>> I think it's interesting, because I can't see references to types 
>>>>>> mentioned in these headers (nor in the resulting classfiles). Do 
>>>>>> you know, on top of your head, as to why this could be happening?
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>>
>>>>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>>>>> Updated webrev: 
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/ 
>>>>>>>
>>>>>>>
>>>>>>> Changed Treemaker implementation to recurse to the 'root' type 
>>>>>>> when creating TypdefTree::def, and using that in DependencyFinder.
>>>>>>>
>>>>>>> Jorn
>>>>>>>
>>>>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>>>>> Oh ok, sorry. I'll try that out.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>>>>> I see what you mean.
>>>>>>>>>>
>>>>>>>>>> Well, `struct BAR` has a definition cursor, but `struct BAR*` 
>>>>>>>>>> does not have a definition/declaration, it's a type that's 
>>>>>>>>>> automatically derived from `struct BAR`. So for `struct BAR*` 
>>>>>>>>>> there is no good value for TypedefTree::def.
>>>>>>>>>
>>>>>>>>> I know that there's no good value for struct BAR* - I was 
>>>>>>>>> wondering if
>>>>>>>>> the parser should detect the pointer, and recurse down, then 
>>>>>>>>> store
>>>>>>>>> 'struct BAR' into 'def'.
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> TypedefTree::def seems to be a helper method mostly targeted 
>>>>>>>>>> at inline type definitions (looking at TypedefHandler [1]).
>>>>>>>>>>
>>>>>>>>>> I think for my use case using 
>>>>>>>>>> TypedefTree.type().canonicalType() is good enough, but I 
>>>>>>>>>> guess we could query that once in the parser and then store 
>>>>>>>>>> it in a field in TypedefTree?
>>>>>>>>>>
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> [1] : 
>>>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>>>>
>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>>>>
>>>>>>>>>>>>      struct BAR { int x; }
>>>>>>>>>>>>
>>>>>>>>>>>>      typedef struct BAR* FOO;
>>>>>>>>>>>>
>>>>>>>>>>>> The TreeMaker code looks at the canonical type of the 
>>>>>>>>>>>> typedef `struct BAR*` which is neither a type definition or 
>>>>>>>>>>>> declaration, so TypedefTree::def is empty. But, in the case 
>>>>>>>>>>>> of dependecy analysis, we still want to inspect that type, 
>>>>>>>>>>>> get the pointee type and add it to the root set.
>>>>>>>>>>>
>>>>>>>>>>> Right - and I think what I'm claiming is that this is 
>>>>>>>>>>> probably some
>>>>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should 
>>>>>>>>>>> probably
>>>>>>>>>>> find another name, 'def' is not good enough IMHO) is set 
>>>>>>>>>>> correctly for
>>>>>>>>>>>
>>>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>>>>
>>>>>>>>>>> it should also be set for
>>>>>>>>>>>
>>>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>>>>
>>>>>>>>>>> That is, I claim that from the typedef tree you should 
>>>>>>>>>>> always see a
>>>>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>>>>
>>>>>>>>>>> Maurizio
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>>>>> So, if I understand correctly, the issue is that if you 
>>>>>>>>>>>>>> have:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you want the struct to be added to the root set. Which 
>>>>>>>>>>>>>> make sense.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at 
>>>>>>>>>>>>>> odds; it seems
>>>>>>>>>>>>>> to me that the jextract tree should have the right info, 
>>>>>>>>>>>>>> under
>>>>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That exact case is covered by TypedefTree::def, but it 
>>>>>>>>>>>>> doesn't cover a
>>>>>>>>>>>>> case like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     struct BAR { int x; }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     typedef struct BAR FOO;
>>>>>>>>>>>>>
>>>>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef 
>>>>>>>>>>>>> contains an inline
>>>>>>>>>>>>> definition or a prototype. But, if the type is just 
>>>>>>>>>>>>> referenced,
>>>>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be enough 
>>>>>>>>>>>>>> (and more in
>>>>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the 
>>>>>>>>>>>>>> kind of thing
>>>>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In other words, let's say that there's a typedef in an 
>>>>>>>>>>>>>> header of a
>>>>>>>>>>>>>> library you want to jextract which is related, via a 
>>>>>>>>>>>>>> chain of other 42
>>>>>>>>>>>>>> typedefs to something in some obscure system library. Are 
>>>>>>>>>>>>>> you sure you
>>>>>>>>>>>>>> want the resulting artifact to have all the 42 
>>>>>>>>>>>>>> intermediate typedefs?
>>>>>>>>>>>>>> After all, the library you wanna extract has just defined 
>>>>>>>>>>>>>> 'its
>>>>>>>>>>>>>> version' of the name for that particular construct... so 
>>>>>>>>>>>>>> it's very
>>>>>>>>>>>>>> likely that users of that library will prefer that name 
>>>>>>>>>>>>>> to the other
>>>>>>>>>>>>>> 42 ones?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>>>>
>>>>>>>>>>>>> But, I think you have a point. Putting the typedef and the 
>>>>>>>>>>>>> canonical
>>>>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>>>>> Update webrev: 
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/ 
>>>>>>>>>>>>>>> This fixes an issue where typedef underlying types were 
>>>>>>>>>>>>>>> not being added to the root set.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It turns out I was using the wrong API to get the 
>>>>>>>>>>>>>>> underlying type, and the needed API didn't have a 
>>>>>>>>>>>>>>> binding yet :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>>>>   * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>>>>   * added TypdefTree.underlyingType() method
>>>>>>>>>>>>>>>   * changed DependencyFinder to use this method instead 
>>>>>>>>>>>>>>> of typeDefinition()
>>>>>>>>>>>>>>>   * added test case for the failure case (also 
>>>>>>>>>>>>>>> simplified test class to only run jextract once and 
>>>>>>>>>>>>>>> reuse the result)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>> headers are only added to the root set if they 
>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both 
>>>>>>>>>>>>>>>>> types of filter are added to the root set. There's 1.) 
>>>>>>>>>>>>>>>>> Filtereing based on whether a symbol is found in a 
>>>>>>>>>>>>>>>>> library, and 2.) based on the used filter patterns 
>>>>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it 
>>>>>>>>>>>>>>>>> would be better to exclude a header from the root set 
>>>>>>>>>>>>>>>>> if all the symbols in it were filtered out by the 
>>>>>>>>>>>>>>>>> filter patterns.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with 
>>>>>>>>>>>>>>>>>>> it a bit before we
>>>>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also, one question: if the user does not specify 
>>>>>>>>>>>>>>>>>>> "-l" what happens? I
>>>>>>>>>>>>>>>>>>> guess the root set will then contain all the headers 
>>>>>>>>>>>>>>>>>>> and we just get
>>>>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are specified 
>>>>>>>>>>>>>>>>>> that is the case
>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns are 
>>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>> headers are only added to the root set if they 
>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe we should just fall back to 
>>>>>>>>>>>>>>>>>> 'include-everything' when no "-l"
>>>>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>>> Updated webrev: 
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/ 
>>>>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the 
>>>>>>>>>>>>>>>>>>>> examples again to try and find other bugs (nothing 
>>>>>>>>>>>>>>>>>>>> found).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee 
>>>>>>>>>>>>>>>>>>>>> and array element type
>>>>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested 
>>>>>>>>>>>>>>>>>>>>> a working fix, but
>>>>>>>>>>>>>>>>>>>>> will also add some tests for the particular cases. 
>>>>>>>>>>>>>>>>>>>>> Updated webrev
>>>>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html 
>>>>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to header 
>>>>>>>>>>>>>>>>>>>>>> & declaration filtering.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Bug: 
>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/ 
>>>>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>>>>   * PattertnFilter; a helper class that 
>>>>>>>>>>>>>>>>>>>>>> encapsulates the Pattern based
>>>>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter. 
>>>>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>>>>   * Filters; a Context sub-component holding 
>>>>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>>>>   * DependencyFinder; finds dependencies of the 
>>>>>>>>>>>>>>>>>>>>>> root set of
>>>>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>>>>   * ElementFilter; filters out non-library symbol 
>>>>>>>>>>>>>>>>>>>>>> elements based on
>>>>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are 
>>>>>>>>>>>>>>>>>>>>>> required by something
>>>>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>>>>   * RootSet; a helper class for managing root 
>>>>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug 
>>>>>>>>>>>>>>>>>>>>>> printing Trees, which I've
>>>>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry 
>>>>>>>>>>>>>>>>>>>>>> API and it's working
>>>>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to 
>>>>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers 
>>>>>>>>>>>>>>>>>>>>>> and static forwarders
>>>>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be 
>>>>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Jorn


More information about the panama-dev mailing list