[foreign] RFC: Additional jextract filtering

Mon Mar 11 22:30:13 UTC 2019

Maurizio Cimadamore schreef op 2019-03-11 22:33:
> On 11/03/2019 15:44, Jorn Vernee wrote:
>> Well, I tried changing the defaults to using the REQUIRED preset for 
>> structs, enums and typedefs, and this is making a bunch (11) of the 
>> tests fail, including TestJextractFFI, which is not a good sign imho. 
>> This seems to be happening because the root set computation can not 
>> deal with the pimpl/opaque pointer idiom.
> I'd like to understand more about this failure mode - example please?

I have to look into this more thoroughly...

Basically, Index.h defines a type like this: `typedef struct 
CXTranslationUnitImpl *CXTranslationUnit;` but there is no definition of 
CXTranslationUnitImpl anywhere. Something like that can be used to 
encapsulate an implementation. Currently this type is not being included 
in the dependency set, but to be fair that is probably a fault of the 
implementation. There are other failing tests because the test headers 
in a lot of cases declare only a struct or union, but no function that 
uses them, so they get filtered out.

Any ways, the point I failed to make, is that the root set is still just 
a 'guess' at best, so I don't believe it should be the default.

>> But, let's go back to the underlying goal; we want to create jextract 
>> output with the least 'junk' possible. I'd say this is not the job of 
>> jextract, but the job of the library maintainer. The header file is 
>> the interface for using the library, so it should contain things that 
>> are all more or less required to use the library. I don't think we 
>> will have much success trying to 'outsmart' the writer of the header 
>> file. After all, jextract does it's filtering automatically, and the 
>> header file is carefully hand-crafted.
> 
> In principle, I agree - and that's why, longer term, we'll have APIs
> that will let you plugin custom filters too. At the same time I think
> that the out-of-the-box jextract experience should be good enough in
> most 'simple' cases - and it feels we're not exactly there right now.

The current problems seem to mainly come from things that jextract can 
not handle, like intrinsics or things like flexible arrays, and not 
having the ability to filter them with the current filter options.

> Maybe we can't make the set as small as your root approach wants it to
> be, but from where we are now (pull in all transitive closure of
> headers) and the minimum possible self-contained subset, I have to
> believe that there *has* to be some intermediate point that we can get
> to.

We could also use the declarations from the explicitly passed headers as 
a root set, and then only include things from other headers if they're 
needed by the root headers.

Of course, when a library has many headers, having to pass each one of 
them is tedious. But I'd say that falls outside of 'simple' case 
territory.

> So, this is really two things:
> 
> 1) finding a good set of heuristics that work in most cases (and I
> wouldn't frown on using path-based filtering - I had a similar
> reaction, but this has proven to work better than I hoped!)
> 
> 2) providing a good bunch of low level filtering (include/exclude)
> tools, actionable from the command line, when (1) goes wrong.

I've really only tried to solve 2. with this proposal, but tried to keep 
the path open for 1. as well. I think 1. is a much complexer problem to 
solve, so it seems good to first add a set of fine-grained filtering 
options as an escape hatch, and then play around with different 
filtering strategies, hopefully easier to implement because of work done 
for 2.

I think for now we can shelve the filter presets (or other automatic 
filtering strategies) and focus on adding fine-grained filtering 
options?

Jorn

> Maurizio
> 
>> 
>> On the other hand, not everything makes sense to use from a Panama 
>> perspective, so we still need some escape hatch to filter out some 
>> stuff we can't use, or breaks the binder. But, we'd like to go about 
>> that disciplined, and make sure we don't filter out things that are 
>> required by other things, so we use a dependency set.
>> 
>> Thoughts?
>> 
>> Jorn
>> 
>> Maurizio Cimadamore schreef op 2019-03-11 16:12:
>>> On 11/03/2019 13:45, Jorn Vernee wrote:
>>>> I can separate the parts of the patch a little bit into; Filter 
>>>> refactor + root set compute, and then leave the option changes out 
>>>> of it. But those 2 alone do not affect the filtering, since the root 
>>>> set is only used when filtering non-symbol/macro elements.
>>> 
>>> I guess then what I'm suggesting is to automatically filter out
>>> elements not in the root set, and see how that works out.
>>> 
>>> Maurizio