RFR: 7903613: Bad nested names are sometimes attached to structs [v4]
Maurizio Cimadamore
mcimadamore at openjdk.org
Wed Dec 20 10:21:09 UTC 2023
On Wed, 20 Dec 2023 10:14:30 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> The `NameMangler` visitor is used to compute the Java name of a jextract declaration. This is implemented as a declaration visitor. Unfortunately, the logic that computes the Java name can be sensitive to the order in which declarations are visited (because this visitor features a "parent" declaration, whose contents affect as to whether a "nested" struct name is generated or not).
>>
>> In reality, the logic of the name mangler needs to be able to disambiguate between structs that are either anonymous, or already declared somewhere else, and structs that are declared as part of a typedef, variable, function parameter/return declaration. In the former case, we either need no Java name (anonymous struct) or a toplevel Java name. In the latter we need a nested struct name (as the struct class will be nested inside some other class).
>>
>> This PR introduces a new visitor which tags all struct/union/enum declarations which fall in the latter bucket. This is done with an algorithm which:
>>
>> 1. visits all declarations in a toplevel header
>> 2. remembers which scoped declarations have been seen *directly* (e.g. as part of the visit)
>> 3. keeps track of which scoped declarations can be seen *indirectly* (e.g. because they are behind some declared type)
>> 4. subtracts the declarations in (2) from the declarations in (3), and visits the declarations in the remaining set
>> 5. keeps performing (2), (3), (4) until there's no declaration in (3)
>>
>> All scoped declarations that appear exclusively as part of some declared type are augmented with the `NestedDecl` attribute, which is then read when calling `Utils::nestedDeclarationFor`. This ensures that all the jextract visitor only recurse on a scoped declaration attached to a type which is known not to have been seen anywhere else. As a result, the behavior of the name mangler is independent of the order in which declarations are seen.
>>
>> It should be possible, in principle, to leverage this infrastructure to define a declaration visitor that automatically looks inside "nested declarations" (so that subsequent visitors don't really need to concern with following declared types).
>>
>> I've tested this change with windows.h, which works as expected.
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>
> Drop NestedDeclFinder
> Use attribute to attach true nested declarations
I have updated the fix with a different approach. Instead of relying on a visitor to "guess" what the nested declaration are, I decided to instead attach the declaration corresponding to nested cursors in a new declaration attribute. This means that all the info that clang sees is now reflected in the declaration API.
Note: libclang is not perfect here, as it often gives jextract duplicate cursors. That is, sometimes the same cursor can appear both at level N and at level N + 1 (this looks like a bug in libclang). The fact that jextract has builtin deduplication for declarations work out in our favor here.
I have tested against `windows.h` again, which extracts fine. I had to make some tweaks so that we only pick up nested "definition" cursors (using `Cursor::isDefinition`), otherwise libclang was sometimes returning weird nested cursors (I saw one for a conditional expression).
-------------
PR Comment: https://git.openjdk.org/jextract/pull/167#issuecomment-1864214526
More information about the jextract-dev
mailing list