[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 15:08:40 UTC 2019

On 26/02/2019 13:39, Jorn Vernee wrote:
> That sounds like an interesting option. So for ToolProvider/command 
> line we'd have:
>
>     String[] args -> Context.Builder -> Context -> run jextract
>
> And for the API we'd have:
>
>     Context.Builder -> Context -> run jextract
>
> Basically cutting out the String[] step.
Yep - but the API approach can do much more! For instance we could use 
the builder to inject custom Tree passes, so that clients could 
'civilize' selected API points according to their needs.
>
> In the last patch I only used the Context to do the actual jextract 
> run. After that, there are several strategies for outputting the 
> result. A user could also choose to use the result directly. So as a 
> straw man:
>
> ```
>     interface JextractAPI {
>         static JextractAPI newInstance();
>
>         Context.Builder contextBuilder();
>         JextractResult runJextract(Context context);
>
>         void writeClassFiles(Path dir, JextractResult);
>         void writeJar(Path jar, JextractResult);
>         void writeJmod(Path jmod, JextractResult)
>     }
> ```
>
> Something like that?
>
> Either way, making Context immutable and adding a Context.Builder 
> seems like a good start, so I'll make a separate RFR for that patch.

The way I see it, JextractAPI should NOT expose 'contextBuilder' but 
should feature the methods from ContextBuilder itself.

So, something like:

interface JextractToolBuilder {
         //context/option methods:
         JextractToolBuilder withClangArgs(String... args);
         JextractToolBuilder withLibraryNames(String... libNames);
         JextractToolBuilder withLibraryPaths(String... libPaths);
         ...

         //run the tool
         JextractResult run(Path... sources);

         interface JextractResult {
             boolean isSuccess();
             List<Diagnostic<HeaderFile>> getDiagnostics();
             void writeClasses(Path classesDir);
             void writeJar(Path jarFile);
             void writeMod(Path modFile);
         }

}

For diagnostics, we could leverage the existing API in javax.tools:

https://docs.oracle.com/javase/10/docs/api/javax/tools/Diagnostic.html

Maurizio

> Jorn
>
> Maurizio Cimadamore schreef op 2019-02-26 12:00:
>> On 26/02/2019 04:06, Sundararajan Athijegannathan wrote:
>>> Actually, I like Context.Builder and immutable Context - regardless 
>>> of whether we pass it around widely or not. Currently it is 
>>> partially immutable with few fields being final & rest non-final + 
>>> mutable collection. Perhaps that part alone as a separate patch?
>>
>> I'm open to that - I just wanted to point out what, to me, seemed like
>> the bigger issue.
>>
>> I we go down the builder path, I wonder if, what we call
>> ContextBuilder isn't part of a bigger JextractToolBuilder API -
>> similar to what has been done in jshell:
>>
>> https://docs.oracle.com/javase/10/docs/api/jdk/jshell/tool/JavaShellToolBuilder.html 
>>
>>
>> Maurizio
>>
>>>
>>> -Sundar
>>>
>>> On 26/02/19, 6:00 AM, Jorn Vernee wrote:
>>>> Hi Maurizio,
>>>>
>>>> I think you're right. I replaced the exception with a warning since 
>>>> the 2 are similar, but throwing an exception/warning at that point 
>>>> is probably wrong in the first place, like you say.
>>>>
>>>> Based on your suggestion, I've implemented the warning separately 
>>>> as a .peek() operation on the stream of Trees.
>>>>
>>>> Updated webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>> Hi Jorn,
>>>>> while in principle making context immutable is the right to do on
>>>>> paper, I believe that we probably ended up curing a symptom and not
>>>>> the disease :-)
>>>>>
>>>>> Which means: I find it very strange to see that a layout computer -
>>>>> that is, a function which takes a clang Cursor and emits a Layout is
>>>>> now suddenly starting to give side-effects. Given I've worked on 
>>>>> javac
>>>>> for many years, I can assure you that this will almost surely lead to
>>>>> problems down the road. For instance, if this apparently harmless
>>>>> stateless function is called twice on the same cursor, we would get
>>>>> two warnings; another issue is that the code will fail to scale if 
>>>>> new
>>>>> warnings will need to be added - e.g. if record layout computer needs
>>>>> to check for other problematic stuff. And I'm not even mentioning
>>>>> adding logic to enable/disable specific classes of warnings, which is
>>>>> another thing that might happen, at some point.
>>>>>
>>>>> What works in these contexts, is to separate the logic that does the
>>>>> stateless computation (e.g. layout computation, in this case), from
>>>>> the logic that issues the warning. For instance, we could have a
>>>>> separate visitor in jextract which looks at the clang cursor, looking
>>>>> for things that are suspicious or not well supported (such as
>>>>> incomplete arrays) and report a warning _only once_.
>>>>>
>>>>> This visitor will probably write itself, and, longer term would be a
>>>>> much more maintainable option than inlining the check in the layout
>>>>> computer function itself.
>>>>>
>>>>> After which, the decision as to whether make the context mutable or
>>>>> not is largely orthogonal - we could go there, although it's not 
>>>>> clear
>>>>> if the extra visitor I'm proposing would significantly switch the
>>>>> balance one way or another (probably not, as we have plenty other
>>>>> visitors which do logging).
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>
>>>>>> Here's the part that makes Context immutable (this required quite 
>>>>>> a bit of refactoring in Main):
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/ 
>>>>>>
>>>>>>
>>>>>> Here's the original part that returns a layout reference when 
>>>>>> encountering a type with a flexible array (applies on top of the 
>>>>>> first one):
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/ 
>>>>>>
>>>>>>
>>>>>> Hope that helps,
>>>>>> Jorn
>>>>>>
>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>> I talked a bit with Sundar;
>>>>>>>
>>>>>>> We can not create a global static method to return the Context 
>>>>>>> since
>>>>>>> there could be more than 1 Context, due to multiple instances of
>>>>>>> Jextract being created through the ToolProvider interface. But, as
>>>>>>> long as Context is Immutable, passing it down to other code 
>>>>>>> should be
>>>>>>> OK.
>>>>>>>
>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>> creating the Context in a separate method in Main (createContext).
>>>>>>> Some places in that code were prematurely exiting and returning an
>>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>>> instead.
>>>>>>>
>>>>>>> Update webrev:
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/ 
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jorn
>>>>>>>
>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>> In other words; we can't get the layout for a type with a flexible
>>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>>> probably to
>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>> searching, and there doesn't seem to be a bug report in their bug
>>>>>>>> database for it either:
>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array 
>>>>>>>> so we
>>>>>>>> could submit a bug about it as well. But, currently you need to
>>>>>>>> manually request an account in order to post bugs, so this 
>>>>>>>> might take
>>>>>>>> a while. I've also send an email just now to the cfe-dev 
>>>>>>>> mailing list,
>>>>>>>> maybe someone there can offer some help.
>>>>>>>>
>>>>>>>> In the meantime we can work around structs with flexible arrays.
>>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>>> extracted
>>>>>>>> interface unusable, so emitting an undefined layout as a 
>>>>>>>> default value
>>>>>>>> and printing a warning seems like a better option to me until the
>>>>>>>> issue is fixed in libclang.
>>>>>>>>
>>>>>>>> There's also the idea of introducing a jextract option to manually
>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like 
>>>>>>>> that to
>>>>>>>> manually provide the layout descriptor for structs with 
>>>>>>>> flexible array
>>>>>>>> members in the meantime.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>> 0-length array for now. >> For jextract, C only allow such 
>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>> as first step.
>>>>>>>>>> Agree
>>>>>>>>>
>>>>>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>>>>>> libclang
>>>>>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>>>>>> changed
>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>
>>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>>> option, so
>>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>>> otherwise
>>>>>>>>> usable, you're out of luck.
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> [1] :
>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>> 0-length array for now. For jextract, C only allow such 
>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>> as first step.
>>>>>>>>>> Agree
>>>>>>>>>>>
>>>>>>>>>>> Move on to more general support, where incomplete array can 
>>>>>>>>>>> be in-between layouts. Before that, we probably need to 
>>>>>>>>>>> validate some assumption,
>>>>>>>>>>>
>>>>>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>>>>>> struct before the incomplete array. I believe this will 
>>>>>>>>>>> pretty much cover most cases if not all.
>>>>>>>>>>
>>>>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>>>>> error if
>>>>>>>>>> the incomplete array is the only member of the struct, or if 
>>>>>>>>>> it's
>>>>>>>>>> followed by other stuff:
>>>>>>>>>>
>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>
>>>>>>>>>> struct A {
>>>>>>>>>>    int arr[];
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> struct B {
>>>>>>>>>>    int l;
>>>>>>>>>>    int arr[];
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> struct C {
>>>>>>>>>>    int l;
>>>>>>>>>>    int arr[];
>>>>>>>>>>    int l2;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>
>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct with 
>>>>>>>>>> no named members
>>>>>>>>>>     int arr[];
>>>>>>>>>>         ^~~
>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end of 
>>>>>>>>>> struct
>>>>>>>>>>     int arr[];
>>>>>>>>>>         ^~~
>>>>>>>>>>
>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed 
>>>>>>>>>> in otherwise empty
>>>>>>>>>>       struct
>>>>>>>>>>    int arr[];
>>>>>>>>>>        ^
>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 
>>>>>>>>>> 'int []' is not at
>>>>>>>>>>       the end of struct
>>>>>>>>>>    int arr[];
>>>>>>>>>>        ^
>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>    int l2;
>>>>>>>>>>        ^
>>>>>>>>>> 2 errors generated.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>
>>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>>> approach -
>>>>>>>>>> e.g. have one or more values in a struct provide more info 
>>>>>>>>>> for certain
>>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>>> business with
>>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>>> variable-sized data
>>>>>>>>>> is expressed as length + data array (sometimes compressed, as in
>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>
>>>>>>>>>> I agree that's where we need to land, longer term. Short term 
>>>>>>>>>> it feels
>>>>>>>>>> like the best move would be to just strip the array. Creating a
>>>>>>>>>> 0-length array might be a  move with subtle consequences: the 
>>>>>>>>>> array
>>>>>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>>>>>> region).
>>>>>>>>>> The boundaries of the enclosing region are usually computed 
>>>>>>>>>> using
>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region 
>>>>>>>>>> won't be
>>>>>>>>>> 'big enough' to host anything but a zero-length array. If we 
>>>>>>>>>> cast the
>>>>>>>>>> array to something else, what we get back is an array whose 
>>>>>>>>>> boundaries
>>>>>>>>>> would exceed those of the enclosing struct - so if you try 
>>>>>>>>>> e.g. to
>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>
>>>>>>>>>> To do this stuff properly in Panama you would need to allocate a
>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much as 
>>>>>>>>>> you would
>>>>>>>>>> in C), and then cast the memory to the struct type - now you 
>>>>>>>>>> have a
>>>>>>>>>> memory region that is big enough to do the struct + the array.
>>>>>>>>>>
>>>>>>>>>> The alternative, which does look simpler, is to just allocate 
>>>>>>>>>> a struct
>>>>>>>>>> (with array stripped) followed by an array of desired size in 
>>>>>>>>>> the same
>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>
>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>
>>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> With this:
>>>>>>>>>>
>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>
>>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 
>>>>>>>>>> 10);
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>>> allocation, use
>>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>>> difference
>>>>>>>>>> between the two: in the first snippet, the array is allocated
>>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>>> happened. In
>>>>>>>>>> the second snippet there's no guarantee that the array will 
>>>>>>>>>> be after
>>>>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>>>>> space with
>>>>>>>>>> the struct and needed to allocate a new slab of native memory 
>>>>>>>>>> via
>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>>>>>> problem,
>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>
>>>>>>>>>> Maurizio