[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Mon Feb 25 23:16:41 UTC 2019

I've split this into 2 to try and make reviewing easier.

Here's the part that makes Context immutable (this required quite a bit 
of refactoring in Main):
http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/

Here's the original part that returns a layout reference when 
encountering a type with a flexible array (applies on top of the first 
one):
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/

Hope that helps,
Jorn

Jorn Vernee schreef op 2019-02-25 23:06:
> I talked a bit with Sundar;
> 
> We can not create a global static method to return the Context since
> there could be more than 1 Context, due to multiple instances of
> Jextract being created through the ToolProvider interface. But, as
> long as Context is Immutable, passing it down to other code should be
> OK.
> 
> I have made Context immutable by adding a Context.Builder, and
> creating the Context in a separate method in Main (createContext).
> Some places in that code were prematurely exiting and returning an
> error code. This is replaced by throwing a custom exception instead.
> 
> Update webrev:
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/
> 
> Thanks,
> Jorn
> 
> Jorn Vernee schreef op 2019-02-16 15:30:
>> In other words; we can't get the layout for a type with a flexible
>> array from libclang at all. A good way to resolve that is probably to
>> submit a patch for it to libclang. I've done a little bit of
>> searching, and there doesn't seem to be a bug report in their bug
>> database for it either:
>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array so we
>> could submit a bug about it as well. But, currently you need to
>> manually request an account in order to post bugs, so this might take
>> a while. I've also send an email just now to the cfe-dev mailing list,
>> maybe someone there can offer some help.
>> 
>> In the meantime we can work around structs with flexible arrays.
>> Currently an exception is thrown, but this makes the whole extracted
>> interface unusable, so emitting an undefined layout as a default value
>> and printing a warning seems like a better option to me until the
>> issue is fixed in libclang.
>> 
>> There's also the idea of introducing a jextract option to manually
>> override a descriptor, e.g. `--patch-descriptor
>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like that to
>> manually provide the layout descriptor for structs with flexible array
>> members in the meantime.
>> 
>> Jorn
>> 
>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>> Now to the real discussion about incomplete array support, instead 
>>>>> of undefined layout, I prefer to have limited support, we can 
>>>>> either strip that field or generate a 0-length array for now. >> 
>>>>> For jextract, C only allow such field at end of struct, and 
>>>>> sizeof() operator simply ignore that trailing array field. This 
>>>>> should give us a good match as first step.
>>>> Agree
>>> 
>>> Sure, that would be preferable. But, as discussed before [1], 
>>> libclang
>>> does not handle incomplete arrays properly, that's why it was changed
>>> to emit an exception in the first place.
>>> 
>>> It is not possible to filter out structs with any jextract option, so
>>> if you have an incomplete array in a header file that is otherwise
>>> usable, you're out of luck.
>>> 
>>> Jorn
>>> 
>>> [1] :
>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html
>>> 
>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>> Now to the real discussion about incomplete array support, instead 
>>>>> of undefined layout, I prefer to have limited support, we can 
>>>>> either strip that field or generate a 0-length array for now. For 
>>>>> jextract, C only allow such field at end of struct, and sizeof() 
>>>>> operator simply ignore that trailing array field. This should give 
>>>>> us a good match as first step.
>>>> Agree
>>>>> 
>>>>> Move on to more general support, where incomplete array can be 
>>>>> in-between layouts. Before that, we probably need to validate some 
>>>>> assumption,
>>>>> 
>>>>> Any incomplete array must have length specified in the same struct 
>>>>> before the incomplete array. I believe this will pretty much cover 
>>>>> most cases if not all.
>>>> 
>>>> To reinforce this point, I believe most compilers even give error if
>>>> the incomplete array is the only member of the struct, or if it's
>>>> followed by other stuff:
>>>> 
>>>> $ cat testInc.h
>>>> 
>>>> struct A {
>>>>    int arr[];
>>>> };
>>>> 
>>>> struct B {
>>>>    int l;
>>>>    int arr[];
>>>> };
>>>> 
>>>> struct C {
>>>>    int l;
>>>>    int arr[];
>>>>    int l2;
>>>> };
>>>> 
>>>> $ gcc -c testInc.h
>>>> 
>>>> testInc.h:2:8: error: flexible array member in a struct with no 
>>>> named members
>>>>     int arr[];
>>>>         ^~~
>>>> testInc.h:12:8: error: flexible array member not at end of struct
>>>>     int arr[];
>>>>         ^~~
>>>> 
>>>> $ clang -c testInc.h
>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed in 
>>>> otherwise empty
>>>>       struct
>>>>    int arr[];
>>>>        ^
>>>> testInc.h:12:8: error: flexible array member 'arr' with type 'int 
>>>> []' is not at
>>>>       the end of struct
>>>>    int arr[];
>>>>        ^
>>>> testInc.h:13:8: note: next field declaration is here
>>>>    int l2;
>>>>        ^
>>>> 2 errors generated.
>>>> 
>>>>> 
>>>>> With that, I think following should work well enough,
>>>> 
>>>> What you describe is what I've dubbed 'dependent layout' approach -
>>>> e.g. have one or more values in a struct provide more info for 
>>>> certain
>>>> layout elements in same struct. This is fairly frequent business 
>>>> with
>>>> message protocols - almost all representation for variable-sized 
>>>> data
>>>> is expressed as length + data array (sometimes compressed, as in
>>>> protobuf's VarInt).
>>>> 
>>>> I agree that's where we need to land, longer term. Short term it 
>>>> feels
>>>> like the best move would be to just strip the array. Creating a
>>>> 0-length array might be a  move with subtle consequences: the array
>>>> occurs within a region with some boundaries (e.g. a struct region).
>>>> The boundaries of the enclosing region are usually computed using
>>>> sizeof(enclosing type). Meaning that the enclosing region won't be
>>>> 'big enough' to host anything but a zero-length array. If we cast 
>>>> the
>>>> array to something else, what we get back is an array whose 
>>>> boundaries
>>>> would exceed those of the enclosing struct - so if you try e.g. to
>>>> write to the array, the operation would fail.
>>>> 
>>>> To do this stuff properly in Panama you would need to allocate a
>>>> bigger chunk of memory, of the desired size (pretty much as you 
>>>> would
>>>> in C), and then cast the memory to the struct type - now you have a
>>>> memory region that is big enough to do the struct + the array.
>>>> 
>>>> The alternative, which does look simpler, is to just allocate a 
>>>> struct
>>>> (with array stripped) followed by an array of desired size in the 
>>>> same
>>>> scope - e.g. compare this:
>>>> 
>>>> try (Scope s : Scope.globalScope().fork()) {
>>>> 
>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>     Array<Integer> data = 
>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>> 
>>>> }
>>>> 
>>>> With this:
>>>> 
>>>> try (Scope s : Scope.globalScope().fork()) {
>>>> 
>>>>     StructWithoutIncompleteArray struct =
>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 10);
>>>> 
>>>> }
>>>> 
>>>> 
>>>> The code looks similar - and a client can, after the allocation, use
>>>> the array and the struct at will. But there's a subtle difference
>>>> between the two: in the first snippet, the array is allocated
>>>> immediately after the struct bits - that's how allocation happened. 
>>>> In
>>>> the second snippet there's no guarantee that the array will be after
>>>> the struct; in fact, the native scope might have run out of space 
>>>> with
>>>> the struct and needed to allocate a new slab of native memory via
>>>> unsafe before the array allocation happens.
>>>> 
>>>> 
>>>> Which seems to suggest that the right way of approaching the 
>>>> problem,
>>>> even if more verbose, is the first one.
>>>> 
>>>> Maurizio