[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Mon Feb 25 22:06:03 UTC 2019

I talked a bit with Sundar;

We can not create a global static method to return the Context since 
there could be more than 1 Context, due to multiple instances of 
Jextract being created through the ToolProvider interface. But, as long 
as Context is Immutable, passing it down to other code should be OK.

I have made Context immutable by adding a Context.Builder, and creating 
the Context in a separate method in Main (createContext). Some places in 
that code were prematurely exiting and returning an error code. This is 
replaced by throwing a custom exception instead.

Update webrev: 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/

Thanks,
Jorn

Jorn Vernee schreef op 2019-02-16 15:30:
> In other words; we can't get the layout for a type with a flexible
> array from libclang at all. A good way to resolve that is probably to
> submit a patch for it to libclang. I've done a little bit of
> searching, and there doesn't seem to be a bug report in their bug
> database for it either:
> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array so we
> could submit a bug about it as well. But, currently you need to
> manually request an account in order to post bugs, so this might take
> a while. I've also send an email just now to the cfe-dev mailing list,
> maybe someone there can offer some help.
> 
> In the meantime we can work around structs with flexible arrays.
> Currently an exception is thrown, but this makes the whole extracted
> interface unusable, so emitting an undefined layout as a default value
> and printing a warning seems like a better option to me until the
> issue is fixed in libclang.
> 
> There's also the idea of introducing a jextract option to manually
> override a descriptor, e.g. `--patch-descriptor
> struct:MyStruct=[i32(x)i32(y)]`. We could use something like that to
> manually provide the layout descriptor for structs with flexible array
> members in the meantime.
> 
> Jorn
> 
> Jorn Vernee schreef op 2019-02-16 09:07:
>>>> Now to the real discussion about incomplete array support, instead 
>>>> of undefined layout, I prefer to have limited support, we can either 
>>>> strip that field or generate a 0-length array for now. >> For 
>>>> jextract, C only allow such field at end of struct, and sizeof() 
>>>> operator simply ignore that trailing array field. This should give 
>>>> us a good match as first step.
>>> Agree
>> 
>> Sure, that would be preferable. But, as discussed before [1], libclang
>> does not handle incomplete arrays properly, that's why it was changed
>> to emit an exception in the first place.
>> 
>> It is not possible to filter out structs with any jextract option, so
>> if you have an incomplete array in a header file that is otherwise
>> usable, you're out of luck.
>> 
>> Jorn
>> 
>> [1] :
>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html
>> 
>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>> Now to the real discussion about incomplete array support, instead 
>>>> of undefined layout, I prefer to have limited support, we can either 
>>>> strip that field or generate a 0-length array for now. For jextract, 
>>>> C only allow such field at end of struct, and sizeof() operator 
>>>> simply ignore that trailing array field. This should give us a good 
>>>> match as first step.
>>> Agree
>>>> 
>>>> Move on to more general support, where incomplete array can be 
>>>> in-between layouts. Before that, we probably need to validate some 
>>>> assumption,
>>>> 
>>>> Any incomplete array must have length specified in the same struct 
>>>> before the incomplete array. I believe this will pretty much cover 
>>>> most cases if not all.
>>> 
>>> To reinforce this point, I believe most compilers even give error if
>>> the incomplete array is the only member of the struct, or if it's
>>> followed by other stuff:
>>> 
>>> $ cat testInc.h
>>> 
>>> struct A {
>>>    int arr[];
>>> };
>>> 
>>> struct B {
>>>    int l;
>>>    int arr[];
>>> };
>>> 
>>> struct C {
>>>    int l;
>>>    int arr[];
>>>    int l2;
>>> };
>>> 
>>> $ gcc -c testInc.h
>>> 
>>> testInc.h:2:8: error: flexible array member in a struct with no named 
>>> members
>>>     int arr[];
>>>         ^~~
>>> testInc.h:12:8: error: flexible array member not at end of struct
>>>     int arr[];
>>>         ^~~
>>> 
>>> $ clang -c testInc.h
>>> testInc.h:2:8: error: flexible array member 'arr' not allowed in 
>>> otherwise empty
>>>       struct
>>>    int arr[];
>>>        ^
>>> testInc.h:12:8: error: flexible array member 'arr' with type 'int []' 
>>> is not at
>>>       the end of struct
>>>    int arr[];
>>>        ^
>>> testInc.h:13:8: note: next field declaration is here
>>>    int l2;
>>>        ^
>>> 2 errors generated.
>>> 
>>>> 
>>>> With that, I think following should work well enough,
>>> 
>>> What you describe is what I've dubbed 'dependent layout' approach -
>>> e.g. have one or more values in a struct provide more info for 
>>> certain
>>> layout elements in same struct. This is fairly frequent business with
>>> message protocols - almost all representation for variable-sized data
>>> is expressed as length + data array (sometimes compressed, as in
>>> protobuf's VarInt).
>>> 
>>> I agree that's where we need to land, longer term. Short term it 
>>> feels
>>> like the best move would be to just strip the array. Creating a
>>> 0-length array might be a  move with subtle consequences: the array
>>> occurs within a region with some boundaries (e.g. a struct region).
>>> The boundaries of the enclosing region are usually computed using
>>> sizeof(enclosing type). Meaning that the enclosing region won't be
>>> 'big enough' to host anything but a zero-length array. If we cast the
>>> array to something else, what we get back is an array whose 
>>> boundaries
>>> would exceed those of the enclosing struct - so if you try e.g. to
>>> write to the array, the operation would fail.
>>> 
>>> To do this stuff properly in Panama you would need to allocate a
>>> bigger chunk of memory, of the desired size (pretty much as you would
>>> in C), and then cast the memory to the struct type - now you have a
>>> memory region that is big enough to do the struct + the array.
>>> 
>>> The alternative, which does look simpler, is to just allocate a 
>>> struct
>>> (with array stripped) followed by an array of desired size in the 
>>> same
>>> scope - e.g. compare this:
>>> 
>>> try (Scope s : Scope.globalScope().fork()) {
>>> 
>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>     Pointer<StructWithIncompleteArray> pstruct =
>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>     Array<Integer> data = 
>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>> 
>>> }
>>> 
>>> With this:
>>> 
>>> try (Scope s : Scope.globalScope().fork()) {
>>> 
>>>     StructWithoutIncompleteArray struct =
>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 10);
>>> 
>>> }
>>> 
>>> 
>>> The code looks similar - and a client can, after the allocation, use
>>> the array and the struct at will. But there's a subtle difference
>>> between the two: in the first snippet, the array is allocated
>>> immediately after the struct bits - that's how allocation happened. 
>>> In
>>> the second snippet there's no guarantee that the array will be after
>>> the struct; in fact, the native scope might have run out of space 
>>> with
>>> the struct and needed to allocate a new slab of native memory via
>>> unsafe before the array allocation happens.
>>> 
>>> 
>>> Which seems to suggest that the right way of approaching the problem,
>>> even if more verbose, is the first one.
>>> 
>>> Maurizio