[foreign] RFR 8220544: Jextract; library dependent header filtering

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Mar 14 17:25:24 UTC 2019


Nope - with this:

$ cat foo.h
#include <string.h>

void foo(int f);

$ cat foo.c
#include "foo.h"

void foo(int f) { }

then,

$  gcc -shared -o libfoo.so foo.c

$ jextract -L . -l foo -d classes foo.h


And then

classes
├── clang_support
│   ├── stddef.class
│   ├── stddef_h.class
│   └── stddef$size_t.class
├── foo.class
├── foo_h.class
├── META-INF
│   └── jextract.properties
└── usr
     └── include
         ├── string.class
         ├── string_h.class
         ├── xlocale.class
         ├── xlocale_h.class
         ├── xlocale$__locale_data.class
         ├── xlocale$__locale_struct.class
         └── xlocale$__locale_t.class

4 directories, 13 files


I wonder if some of the library lookup functions we have on linux 
(dlopen and co.) are doing a 'recursive' search into dependent libs as 
well, which might be disabled on Win.


Maurizio


On 14/03/2019 16:38, Maurizio Cimadamore wrote:
> Ah ok
>
> will give that another try
>
> Maurizio
>
> On 14/03/2019 16:16, Jorn Vernee wrote:
>> Sorry, that's with the v3 patch 
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.03/
>>
>> I reverted back after the concerns raised about changing the 
>> implementation of TypedefTree::def You're right in that this 
>> incorrectly includes some trees.
>>
>> Sorry,
>> Jorn
>>
>> Jorn Vernee schreef op 2019-03-14 17:08:
>>> This produces test.class and test_h.class as expected, but nothing 
>>> else...
>>>
>>> Used command is: `jextract -L . -l test test.h`
>>>
>>> You would get stuff from string.h when the -l is omitted.
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-03-14 17:00:
>>>> I think you can reproduce simply by having an header like this:
>>>>
>>>> #include <string.h>
>>>>
>>>> void foo(int i);
>>>>
>>>>
>>>> if you extract that you would expect one symbol, but in reality you
>>>> get also all the stuff from string.h.
>>>>
>>>> Maurizio
>>>>
>>>> On 14/03/2019 15:49, Jorn Vernee wrote:
>>>>> The Windows jextract artifact for the OpenGL sample is a different 
>>>>> beast, because it relies on the standard Windows OpenGL libraries. 
>>>>> So, seeing some system headers is to be expected :) In the case of 
>>>>> python I'm also seeing some system headers, because python depends 
>>>>> on a hand full of corecrt types, so that's also correct. But, 
>>>>> neither one seems to be referencing imaxabs, so I'm not sure 
>>>>> what's going on...
>>>>>
>>>>> I'll try to see if I can find a repro example on Windows to see 
>>>>> about a fix. I'm not sure I see how the SymbolFilter could get to 
>>>>> the VM libraries? Though I remember seeing a similar case in the 
>>>>> past...
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>>>>>> I think I got at the bottom of this. I tried debugging an extraction
>>>>>> of opengl, and it eventually ends up adding inttypes.h to the 
>>>>>> rootset
>>>>>> because of this symbol:
>>>>>>
>>>>>> imaxabs
>>>>>>
>>>>>> The library filter doesn't work here, because this is a function
>>>>>> defined in the set of default libraries that comes with the VM, so
>>>>>> jextract 'thinks' this symbol is genuinely part of one of the
>>>>>> libraries specified in the command line - except is not.
>>>>>>
>>>>>> I tried an hacky fix which checks for absence of symbol in
>>>>>> Libraries.getDefaultLibrary - and that seems to give the expected
>>>>>> results - but I leave this to your consideration.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>>>>>> Hi Jorn,
>>>>>>> I gave this a try with our OpenGL and Python examples. The 
>>>>>>> extraction process works fine, and the generated jars are a bit 
>>>>>>> smaller - but I still see a lot of system-related headers. 
>>>>>>> Python has still lots of them, OpenGL less so, but even in the 
>>>>>>> GL case, I think it's interesting, because I can't see 
>>>>>>> references to types mentioned in these headers (nor in the 
>>>>>>> resulting classfiles). Do you know, on top of your head, as to 
>>>>>>> why this could be happening?
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>
>>>>>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>>>>>> Updated webrev: 
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Changed Treemaker implementation to recurse to the 'root' type 
>>>>>>>> when creating TypdefTree::def, and using that in DependencyFinder.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>>>>>> Oh ok, sorry. I'll try that out.
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>>>>>> I see what you mean.
>>>>>>>>>>>
>>>>>>>>>>> Well, `struct BAR` has a definition cursor, but `struct 
>>>>>>>>>>> BAR*` does not have a definition/declaration, it's a type 
>>>>>>>>>>> that's automatically derived from `struct BAR`. So for 
>>>>>>>>>>> `struct BAR*` there is no good value for TypedefTree::def.
>>>>>>>>>>
>>>>>>>>>> I know that there's no good value for struct BAR* - I was 
>>>>>>>>>> wondering if
>>>>>>>>>> the parser should detect the pointer, and recurse down, then 
>>>>>>>>>> store
>>>>>>>>>> 'struct BAR' into 'def'.
>>>>>>>>>>
>>>>>>>>>> Maurizio
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> TypedefTree::def seems to be a helper method mostly targeted 
>>>>>>>>>>> at inline type definitions (looking at TypedefHandler [1]).
>>>>>>>>>>>
>>>>>>>>>>> I think for my use case using 
>>>>>>>>>>> TypedefTree.type().canonicalType() is good enough, but I 
>>>>>>>>>>> guess we could query that once in the parser and then store 
>>>>>>>>>>> it in a field in TypedefTree?
>>>>>>>>>>>
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> [1] : 
>>>>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>>>>>
>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      struct BAR { int x; }
>>>>>>>>>>>>>
>>>>>>>>>>>>>      typedef struct BAR* FOO;
>>>>>>>>>>>>>
>>>>>>>>>>>>> The TreeMaker code looks at the canonical type of the 
>>>>>>>>>>>>> typedef `struct BAR*` which is neither a type definition 
>>>>>>>>>>>>> or declaration, so TypedefTree::def is empty. But, in the 
>>>>>>>>>>>>> case of dependecy analysis, we still want to inspect that 
>>>>>>>>>>>>> type, get the pointee type and add it to the root set.
>>>>>>>>>>>>
>>>>>>>>>>>> Right - and I think what I'm claiming is that this is 
>>>>>>>>>>>> probably some
>>>>>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should 
>>>>>>>>>>>> probably
>>>>>>>>>>>> find another name, 'def' is not good enough IMHO) is set 
>>>>>>>>>>>> correctly for
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>>>>>
>>>>>>>>>>>> it should also be set for
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>>>>>
>>>>>>>>>>>> That is, I claim that from the typedef tree you should 
>>>>>>>>>>>> always see a
>>>>>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>>>>>> So, if I understand correctly, the issue is that if you 
>>>>>>>>>>>>>>> have:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you want the struct to be added to the root set. Which 
>>>>>>>>>>>>>>> make sense.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at 
>>>>>>>>>>>>>>> odds; it seems
>>>>>>>>>>>>>>> to me that the jextract tree should have the right info, 
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That exact case is covered by TypedefTree::def, but it 
>>>>>>>>>>>>>> doesn't cover a
>>>>>>>>>>>>>> case like this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     struct BAR { int x; }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     typedef struct BAR FOO;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef 
>>>>>>>>>>>>>> contains an inline
>>>>>>>>>>>>>> definition or a prototype. But, if the type is just 
>>>>>>>>>>>>>> referenced,
>>>>>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be 
>>>>>>>>>>>>>>> enough (and more in
>>>>>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the 
>>>>>>>>>>>>>>> kind of thing
>>>>>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In other words, let's say that there's a typedef in an 
>>>>>>>>>>>>>>> header of a
>>>>>>>>>>>>>>> library you want to jextract which is related, via a 
>>>>>>>>>>>>>>> chain of other 42
>>>>>>>>>>>>>>> typedefs to something in some obscure system library. 
>>>>>>>>>>>>>>> Are you sure you
>>>>>>>>>>>>>>> want the resulting artifact to have all the 42 
>>>>>>>>>>>>>>> intermediate typedefs?
>>>>>>>>>>>>>>> After all, the library you wanna extract has just 
>>>>>>>>>>>>>>> defined 'its
>>>>>>>>>>>>>>> version' of the name for that particular construct... so 
>>>>>>>>>>>>>>> it's very
>>>>>>>>>>>>>>> likely that users of that library will prefer that name 
>>>>>>>>>>>>>>> to the other
>>>>>>>>>>>>>>> 42 ones?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But, I think you have a point. Putting the typedef and 
>>>>>>>>>>>>>> the canonical
>>>>>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>>>>>> Update webrev: 
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/ 
>>>>>>>>>>>>>>>> This fixes an issue where typedef underlying types were 
>>>>>>>>>>>>>>>> not being added to the root set.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It turns out I was using the wrong API to get the 
>>>>>>>>>>>>>>>> underlying type, and the needed API didn't have a 
>>>>>>>>>>>>>>>> binding yet :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>>>>>   * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>>>>>   * added TypdefTree.underlyingType() method
>>>>>>>>>>>>>>>>   * changed DependencyFinder to use this method instead 
>>>>>>>>>>>>>>>> of typeDefinition()
>>>>>>>>>>>>>>>>   * added test case for the failure case (also 
>>>>>>>>>>>>>>>> simplified test class to only run jextract once and 
>>>>>>>>>>>>>>>> reuse the result)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are 
>>>>>>>>>>>>>>>>>>> specified that is the case
>>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns 
>>>>>>>>>>>>>>>>>>> are specified
>>>>>>>>>>>>>>>>>>> headers are only added to the root set if they 
>>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both 
>>>>>>>>>>>>>>>>>> types of filter are added to the root set. There's 
>>>>>>>>>>>>>>>>>> 1.) Filtereing based on whether a symbol is found in 
>>>>>>>>>>>>>>>>>> a library, and 2.) based on the used filter patterns 
>>>>>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it 
>>>>>>>>>>>>>>>>>> would be better to exclude a header from the root set 
>>>>>>>>>>>>>>>>>> if all the symbols in it were filtered out by the 
>>>>>>>>>>>>>>>>>> filter patterns.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with 
>>>>>>>>>>>>>>>>>>>> it a bit before we
>>>>>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, one question: if the user does not specify 
>>>>>>>>>>>>>>>>>>>> "-l" what happens? I
>>>>>>>>>>>>>>>>>>>> guess the root set will then contain all the 
>>>>>>>>>>>>>>>>>>>> headers and we just get
>>>>>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are 
>>>>>>>>>>>>>>>>>>> specified that is the case
>>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns 
>>>>>>>>>>>>>>>>>>> are specified
>>>>>>>>>>>>>>>>>>> headers are only added to the root set if they 
>>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe we should just fall back to 
>>>>>>>>>>>>>>>>>>> 'include-everything' when no "-l"
>>>>>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>>>> Updated webrev: 
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/ 
>>>>>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the 
>>>>>>>>>>>>>>>>>>>>> examples again to try and find other bugs (nothing 
>>>>>>>>>>>>>>>>>>>>> found).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee 
>>>>>>>>>>>>>>>>>>>>>> and array element type
>>>>>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested 
>>>>>>>>>>>>>>>>>>>>>> a working fix, but
>>>>>>>>>>>>>>>>>>>>>> will also add some tests for the particular 
>>>>>>>>>>>>>>>>>>>>>> cases. Updated webrev
>>>>>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html 
>>>>>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to 
>>>>>>>>>>>>>>>>>>>>>>> header & declaration filtering.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Bug: 
>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/ 
>>>>>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>>>>>   * PattertnFilter; a helper class that 
>>>>>>>>>>>>>>>>>>>>>>> encapsulates the Pattern based
>>>>>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter. 
>>>>>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>>>>>   * Filters; a Context sub-component holding 
>>>>>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>>>>>   * DependencyFinder; finds dependencies of the 
>>>>>>>>>>>>>>>>>>>>>>> root set of
>>>>>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>>>>>   * ElementFilter; filters out non-library 
>>>>>>>>>>>>>>>>>>>>>>> symbol elements based on
>>>>>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are 
>>>>>>>>>>>>>>>>>>>>>>> required by something
>>>>>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>>>>>   * RootSet; a helper class for managing root 
>>>>>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug 
>>>>>>>>>>>>>>>>>>>>>>> printing Trees, which I've
>>>>>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry 
>>>>>>>>>>>>>>>>>>>>>>> API and it's working
>>>>>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to 
>>>>>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers 
>>>>>>>>>>>>>>>>>>>>>>> and static forwarders
>>>>>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be 
>>>>>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Jorn


More information about the panama-dev mailing list