[foreign] RFR 8220544: Jextract; library dependent header filtering
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Mar 14 17:25:24 UTC 2019
Nope - with this:
$ cat foo.h
#include <string.h>
void foo(int f);
$ cat foo.c
#include "foo.h"
void foo(int f) { }
then,
$ gcc -shared -o libfoo.so foo.c
$ jextract -L . -l foo -d classes foo.h
And then
classes
├── clang_support
│ ├── stddef.class
│ ├── stddef_h.class
│ └── stddef$size_t.class
├── foo.class
├── foo_h.class
├── META-INF
│ └── jextract.properties
└── usr
└── include
├── string.class
├── string_h.class
├── xlocale.class
├── xlocale_h.class
├── xlocale$__locale_data.class
├── xlocale$__locale_struct.class
└── xlocale$__locale_t.class
4 directories, 13 files
I wonder if some of the library lookup functions we have on linux
(dlopen and co.) are doing a 'recursive' search into dependent libs as
well, which might be disabled on Win.
Maurizio
On 14/03/2019 16:38, Maurizio Cimadamore wrote:
> Ah ok
>
> will give that another try
>
> Maurizio
>
> On 14/03/2019 16:16, Jorn Vernee wrote:
>> Sorry, that's with the v3 patch
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.03/
>>
>> I reverted back after the concerns raised about changing the
>> implementation of TypedefTree::def You're right in that this
>> incorrectly includes some trees.
>>
>> Sorry,
>> Jorn
>>
>> Jorn Vernee schreef op 2019-03-14 17:08:
>>> This produces test.class and test_h.class as expected, but nothing
>>> else...
>>>
>>> Used command is: `jextract -L . -l test test.h`
>>>
>>> You would get stuff from string.h when the -l is omitted.
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2019-03-14 17:00:
>>>> I think you can reproduce simply by having an header like this:
>>>>
>>>> #include <string.h>
>>>>
>>>> void foo(int i);
>>>>
>>>>
>>>> if you extract that you would expect one symbol, but in reality you
>>>> get also all the stuff from string.h.
>>>>
>>>> Maurizio
>>>>
>>>> On 14/03/2019 15:49, Jorn Vernee wrote:
>>>>> The Windows jextract artifact for the OpenGL sample is a different
>>>>> beast, because it relies on the standard Windows OpenGL libraries.
>>>>> So, seeing some system headers is to be expected :) In the case of
>>>>> python I'm also seeing some system headers, because python depends
>>>>> on a hand full of corecrt types, so that's also correct. But,
>>>>> neither one seems to be referencing imaxabs, so I'm not sure
>>>>> what's going on...
>>>>>
>>>>> I'll try to see if I can find a repro example on Windows to see
>>>>> about a fix. I'm not sure I see how the SymbolFilter could get to
>>>>> the VM libraries? Though I remember seeing a similar case in the
>>>>> past...
>>>>>
>>>>> Jorn
>>>>>
>>>>> Maurizio Cimadamore schreef op 2019-03-14 15:55:
>>>>>> I think I got at the bottom of this. I tried debugging an extraction
>>>>>> of opengl, and it eventually ends up adding inttypes.h to the
>>>>>> rootset
>>>>>> because of this symbol:
>>>>>>
>>>>>> imaxabs
>>>>>>
>>>>>> The library filter doesn't work here, because this is a function
>>>>>> defined in the set of default libraries that comes with the VM, so
>>>>>> jextract 'thinks' this symbol is genuinely part of one of the
>>>>>> libraries specified in the command line - except is not.
>>>>>>
>>>>>> I tried an hacky fix which checks for absence of symbol in
>>>>>> Libraries.getDefaultLibrary - and that seems to give the expected
>>>>>> results - but I leave this to your consideration.
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>> On 14/03/2019 14:27, Maurizio Cimadamore wrote:
>>>>>>> Hi Jorn,
>>>>>>> I gave this a try with our OpenGL and Python examples. The
>>>>>>> extraction process works fine, and the generated jars are a bit
>>>>>>> smaller - but I still see a lot of system-related headers.
>>>>>>> Python has still lots of them, OpenGL less so, but even in the
>>>>>>> GL case, I think it's interesting, because I can't see
>>>>>>> references to types mentioned in these headers (nor in the
>>>>>>> resulting classfiles). Do you know, on top of your head, as to
>>>>>>> why this could be happening?
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>
>>>>>>> On 14/03/2019 12:45, Jorn Vernee wrote:
>>>>>>>> Updated webrev:
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.04/
>>>>>>>>
>>>>>>>>
>>>>>>>> Changed Treemaker implementation to recurse to the 'root' type
>>>>>>>> when creating TypdefTree::def, and using that in DependencyFinder.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-03-14 12:49:
>>>>>>>>> Oh ok, sorry. I'll try that out.
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 12:43:
>>>>>>>>>> On 14/03/2019 11:40, Jorn Vernee wrote:
>>>>>>>>>>> I see what you mean.
>>>>>>>>>>>
>>>>>>>>>>> Well, `struct BAR` has a definition cursor, but `struct
>>>>>>>>>>> BAR*` does not have a definition/declaration, it's a type
>>>>>>>>>>> that's automatically derived from `struct BAR`. So for
>>>>>>>>>>> `struct BAR*` there is no good value for TypedefTree::def.
>>>>>>>>>>
>>>>>>>>>> I know that there's no good value for struct BAR* - I was
>>>>>>>>>> wondering if
>>>>>>>>>> the parser should detect the pointer, and recurse down, then
>>>>>>>>>> store
>>>>>>>>>> 'struct BAR' into 'def'.
>>>>>>>>>>
>>>>>>>>>> Maurizio
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> TypedefTree::def seems to be a helper method mostly targeted
>>>>>>>>>>> at inline type definitions (looking at TypedefHandler [1]).
>>>>>>>>>>>
>>>>>>>>>>> I think for my use case using
>>>>>>>>>>> TypedefTree.type().canonicalType() is good enough, but I
>>>>>>>>>>> guess we could query that once in the parser and then store
>>>>>>>>>>> it in a field in TypedefTree?
>>>>>>>>>>>
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> [1] :
>>>>>>>>>>> http://hg.openjdk.java.net/panama/dev/file/43ed53f40957/src/jdk.jextract/share/classes/com/sun/tools/jextract/TypedefHandler.java#l107
>>>>>>>>>>>
>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 11:57:
>>>>>>>>>>>> On 14/03/2019 00:23, Jorn Vernee wrote:
>>>>>>>>>>>>> Sorry, bad example. The actual failure case is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> struct BAR { int x; }
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>>>>>>
>>>>>>>>>>>>> The TreeMaker code looks at the canonical type of the
>>>>>>>>>>>>> typedef `struct BAR*` which is neither a type definition
>>>>>>>>>>>>> or declaration, so TypedefTree::def is empty. But, in the
>>>>>>>>>>>>> case of dependecy analysis, we still want to inspect that
>>>>>>>>>>>>> type, get the pointee type and add it to the root set.
>>>>>>>>>>>>
>>>>>>>>>>>> Right - and I think what I'm claiming is that this is
>>>>>>>>>>>> probably some
>>>>>>>>>>>> kind of a parser bug? That is, if Typedef::def (we should
>>>>>>>>>>>> probably
>>>>>>>>>>>> find another name, 'def' is not good enough IMHO) is set
>>>>>>>>>>>> correctly for
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>>>>>
>>>>>>>>>>>> it should also be set for
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct BAR* FOO;
>>>>>>>>>>>>
>>>>>>>>>>>> That is, I claim that from the typedef tree you should
>>>>>>>>>>>> always see a
>>>>>>>>>>>> referenced type declaration w/o using the clang API.
>>>>>>>>>>>>
>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-14 01:14:
>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-14 01:04:
>>>>>>>>>>>>>>> So, if I understand correctly, the issue is that if you
>>>>>>>>>>>>>>> have:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef struct { int x; } FOO;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you want the struct to be added to the root set. Which
>>>>>>>>>>>>>>> make sense.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, with respect to getUnderlyingType - I'm a bit at
>>>>>>>>>>>>>>> odds; it seems
>>>>>>>>>>>>>>> to me that the jextract tree should have the right info,
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>> TypedefTree::def, at least that is set by the parser for
>>>>>>>>>>>>>>> declarations/definitions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That exact case is covered by TypedefTree::def, but it
>>>>>>>>>>>>>> doesn't cover a
>>>>>>>>>>>>>> case like this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct BAR { int x; }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef struct BAR FOO;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i.e. TypedefTree::def is only set when the typedef
>>>>>>>>>>>>>> contains an inline
>>>>>>>>>>>>>> definition or a prototype. But, if the type is just
>>>>>>>>>>>>>> referenced,
>>>>>>>>>>>>>> TypedefTree::def is an empty Optional.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Wouldn't visiting TypedefTree::def (if present) be
>>>>>>>>>>>>>>> enough (and more in
>>>>>>>>>>>>>>> sync with the rest of jextract) ? After all, that's the
>>>>>>>>>>>>>>> kind of thing
>>>>>>>>>>>>>>> you wanna be sure not to miss by the dependency analysis -
>>>>>>>>>>>>>>> getUnderlyingType seems a tad too noisy?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is what I had before, but it fell short in some cases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In other words, let's say that there's a typedef in an
>>>>>>>>>>>>>>> header of a
>>>>>>>>>>>>>>> library you want to jextract which is related, via a
>>>>>>>>>>>>>>> chain of other 42
>>>>>>>>>>>>>>> typedefs to something in some obscure system library.
>>>>>>>>>>>>>>> Are you sure you
>>>>>>>>>>>>>>> want the resulting artifact to have all the 42
>>>>>>>>>>>>>>> intermediate typedefs?
>>>>>>>>>>>>>>> After all, the library you wanna extract has just
>>>>>>>>>>>>>>> defined 'its
>>>>>>>>>>>>>>> version' of the name for that particular construct... so
>>>>>>>>>>>>>>> it's very
>>>>>>>>>>>>>>> likely that users of that library will prefer that name
>>>>>>>>>>>>>>> to the other
>>>>>>>>>>>>>>> 42 ones?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 42 intermediate typedefs seems a bit extreme...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But, I think you have a point. Putting the typedef and
>>>>>>>>>>>>>> the canonical
>>>>>>>>>>>>>> type into the root set is probably enough.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 13/03/2019 23:31, Jorn Vernee wrote:
>>>>>>>>>>>>>>>> Update webrev:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.02/
>>>>>>>>>>>>>>>> This fixes an issue where typedef underlying types were
>>>>>>>>>>>>>>>> not being added to the root set.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It turns out I was using the wrong API to get the
>>>>>>>>>>>>>>>> underlying type, and the needed API didn't have a
>>>>>>>>>>>>>>>> binding yet :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The changes with respect to the previous webrev;
>>>>>>>>>>>>>>>> * added Cursor.getTypedefDeclUnderlyingType() binding
>>>>>>>>>>>>>>>> * added TypdefTree.underlyingType() method
>>>>>>>>>>>>>>>> * changed DependencyFinder to use this method instead
>>>>>>>>>>>>>>>> of typeDefinition()
>>>>>>>>>>>>>>>> * added test case for the failure case (also
>>>>>>>>>>>>>>>> simplified test class to only run jextract once and
>>>>>>>>>>>>>>>> reuse the result)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:43:
>>>>>>>>>>>>>>>>> On 13/03/2019 21:36, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are
>>>>>>>>>>>>>>>>>>> specified that is the case
>>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns
>>>>>>>>>>>>>>>>>>> are specified
>>>>>>>>>>>>>>>>>>> headers are only added to the root set if they
>>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> FWIW, basically headers with symbols that pass both
>>>>>>>>>>>>>>>>>> types of filter are added to the root set. There's
>>>>>>>>>>>>>>>>>> 1.) Filtereing based on whether a symbol is found in
>>>>>>>>>>>>>>>>>> a library, and 2.) based on the used filter patterns
>>>>>>>>>>>>>>>>>> (--include-symbols & --exclude-symbols). I thought it
>>>>>>>>>>>>>>>>>> would be better to exclude a header from the root set
>>>>>>>>>>>>>>>>>> if all the symbols in it were filtered out by the
>>>>>>>>>>>>>>>>>> filter patterns.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yep - I think that sounds sensible
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 22:30:
>>>>>>>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-03-13 22:08:
>>>>>>>>>>>>>>>>>>>> Hi Jorn,
>>>>>>>>>>>>>>>>>>>> this looks very good indeed. I'd like to play with
>>>>>>>>>>>>>>>>>>>> it a bit before we
>>>>>>>>>>>>>>>>>>>> go ahead with this, if that's ok.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please do!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, one question: if the user does not specify
>>>>>>>>>>>>>>>>>>>> "-l" what happens? I
>>>>>>>>>>>>>>>>>>>> guess the root set will then contain all the
>>>>>>>>>>>>>>>>>>>> headers and we just get
>>>>>>>>>>>>>>>>>>>> back the full output (which would be ok) ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Good questions. When no filter patterns are
>>>>>>>>>>>>>>>>>>> specified that is the case
>>>>>>>>>>>>>>>>>>> everything is outputted, but when filter patterns
>>>>>>>>>>>>>>>>>>> are specified
>>>>>>>>>>>>>>>>>>> headers are only added to the root set if they
>>>>>>>>>>>>>>>>>>> contain a symbol that
>>>>>>>>>>>>>>>>>>> passes the filter.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe we should just fall back to
>>>>>>>>>>>>>>>>>>> 'include-everything' when no "-l"
>>>>>>>>>>>>>>>>>>> options are specified, regardless of filter?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maurizio
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 13/03/2019 18:06, Jorn Vernee wrote:
>>>>>>>>>>>>>>>>>>>>> Updated webrev:
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.01/
>>>>>>>>>>>>>>>>>>>>> Fixed the bug, also added tests & ran all the
>>>>>>>>>>>>>>>>>>>>> examples again to try and find other bugs (nothing
>>>>>>>>>>>>>>>>>>>>> found).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:50:
>>>>>>>>>>>>>>>>>>>>>> Discovered a bug in this after posting :(
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> DependencyFinder also needs to look for pointee
>>>>>>>>>>>>>>>>>>>>>> and array element type
>>>>>>>>>>>>>>>>>>>>>> cursors (which it currently doesn't). I've tested
>>>>>>>>>>>>>>>>>>>>>> a working fix, but
>>>>>>>>>>>>>>>>>>>>>> will also add some tests for the particular
>>>>>>>>>>>>>>>>>>>>>> cases. Updated webrev
>>>>>>>>>>>>>>>>>>>>>> coming soon™
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jorn Vernee schreef op 2019-03-13 17:21:
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> After some discussion:
>>>>>>>>>>>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-March/004828.html
>>>>>>>>>>>>>>>>>>>>>>> I've implemented the discussed approach to
>>>>>>>>>>>>>>>>>>>>>>> header & declaration filtering.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220544
>>>>>>>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8220544/webrev.00/
>>>>>>>>>>>>>>>>>>>>>>> There are a few new implementation classes:
>>>>>>>>>>>>>>>>>>>>>>> * PattertnFilter; a helper class that
>>>>>>>>>>>>>>>>>>>>>>> encapsulates the Pattern based
>>>>>>>>>>>>>>>>>>>>>>> filtering code previously found in SymbolFilter.
>>>>>>>>>>>>>>>>>>>>>>> This code had to be
>>>>>>>>>>>>>>>>>>>>>>> reused.
>>>>>>>>>>>>>>>>>>>>>>> * Filters; a Context sub-component holding
>>>>>>>>>>>>>>>>>>>>>>> PatternFilters for
>>>>>>>>>>>>>>>>>>>>>>> different types of declarations.
>>>>>>>>>>>>>>>>>>>>>>> * DependencyFinder; finds dependencies of the
>>>>>>>>>>>>>>>>>>>>>>> root set of
>>>>>>>>>>>>>>>>>>>>>>> declarations found in the root headers.
>>>>>>>>>>>>>>>>>>>>>>> * ElementFilter; filters out non-library
>>>>>>>>>>>>>>>>>>>>>>> symbol elements based on
>>>>>>>>>>>>>>>>>>>>>>> whether they appear in a root header or are
>>>>>>>>>>>>>>>>>>>>>>> required by something
>>>>>>>>>>>>>>>>>>>>>>> therein.
>>>>>>>>>>>>>>>>>>>>>>> * RootSet; a helper class for managing root
>>>>>>>>>>>>>>>>>>>>>>> headers + required element Trees.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I also encountered some crashes when debug
>>>>>>>>>>>>>>>>>>>>>>> printing Trees, which I've
>>>>>>>>>>>>>>>>>>>>>>> fixed/worked around accordingly.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> FWIW, I tried this out with the Windows registry
>>>>>>>>>>>>>>>>>>>>>>> API and it's working
>>>>>>>>>>>>>>>>>>>>>>> very nicely! Going from well over 100 headers to
>>>>>>>>>>>>>>>>>>>>>>> under 10. Some room
>>>>>>>>>>>>>>>>>>>>>>> for improvement still exists; some empty headers
>>>>>>>>>>>>>>>>>>>>>>> and static forwarders
>>>>>>>>>>>>>>>>>>>>>>> could still be omitted, but imho that should be
>>>>>>>>>>>>>>>>>>>>>>> handled by a separate
>>>>>>>>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Jorn
More information about the panama-dev
mailing list