[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 00:30:19 UTC 2019

Hi Maurizio,

I think you're right. I replaced the exception with a warning since the 
2 are similar, but throwing an exception/warning at that point is 
probably wrong in the first place, like you say.

Based on your suggestion, I've implemented the warning separately as a 
.peek() operation on the stream of Trees.

Updated webrev: 
http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/

Jorn

Maurizio Cimadamore schreef op 2019-02-26 00:34:
> Hi Jorn,
> while in principle making context immutable is the right to do on
> paper, I believe that we probably ended up curing a symptom and not
> the disease :-)
> 
> Which means: I find it very strange to see that a layout computer -
> that is, a function which takes a clang Cursor and emits a Layout is
> now suddenly starting to give side-effects. Given I've worked on javac
> for many years, I can assure you that this will almost surely lead to
> problems down the road. For instance, if this apparently harmless
> stateless function is called twice on the same cursor, we would get
> two warnings; another issue is that the code will fail to scale if new
> warnings will need to be added - e.g. if record layout computer needs
> to check for other problematic stuff. And I'm not even mentioning
> adding logic to enable/disable specific classes of warnings, which is
> another thing that might happen, at some point.
> 
> What works in these contexts, is to separate the logic that does the
> stateless computation (e.g. layout computation, in this case), from
> the logic that issues the warning. For instance, we could have a
> separate visitor in jextract which looks at the clang cursor, looking
> for things that are suspicious or not well supported (such as
> incomplete arrays) and report a warning _only once_.
> 
> This visitor will probably write itself, and, longer term would be a
> much more maintainable option than inlining the check in the layout
> computer function itself.
> 
> After which, the decision as to whether make the context mutable or
> not is largely orthogonal - we could go there, although it's not clear
> if the extra visitor I'm proposing would significantly switch the
> balance one way or another (probably not, as we have plenty other
> visitors which do logging).
> 
> Maurizio
> 
> 
> On 25/02/2019 23:16, Jorn Vernee wrote:
>> I've split this into 2 to try and make reviewing easier.
>> 
>> Here's the part that makes Context immutable (this required quite a 
>> bit of refactoring in Main):
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/
>> 
>> Here's the original part that returns a layout reference when 
>> encountering a type with a flexible array (applies on top of the first 
>> one):
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/
>> 
>> Hope that helps,
>> Jorn
>> 
>> Jorn Vernee schreef op 2019-02-25 23:06:
>>> I talked a bit with Sundar;
>>> 
>>> We can not create a global static method to return the Context since
>>> there could be more than 1 Context, due to multiple instances of
>>> Jextract being created through the ToolProvider interface. But, as
>>> long as Context is Immutable, passing it down to other code should be
>>> OK.
>>> 
>>> I have made Context immutable by adding a Context.Builder, and
>>> creating the Context in a separate method in Main (createContext).
>>> Some places in that code were prematurely exiting and returning an
>>> error code. This is replaced by throwing a custom exception instead.
>>> 
>>> Update webrev:
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/
>>> 
>>> Thanks,
>>> Jorn
>>> 
>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>> In other words; we can't get the layout for a type with a flexible
>>>> array from libclang at all. A good way to resolve that is probably 
>>>> to
>>>> submit a patch for it to libclang. I've done a little bit of
>>>> searching, and there doesn't seem to be a bug report in their bug
>>>> database for it either:
>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array so we
>>>> could submit a bug about it as well. But, currently you need to
>>>> manually request an account in order to post bugs, so this might 
>>>> take
>>>> a while. I've also send an email just now to the cfe-dev mailing 
>>>> list,
>>>> maybe someone there can offer some help.
>>>> 
>>>> In the meantime we can work around structs with flexible arrays.
>>>> Currently an exception is thrown, but this makes the whole extracted
>>>> interface unusable, so emitting an undefined layout as a default 
>>>> value
>>>> and printing a warning seems like a better option to me until the
>>>> issue is fixed in libclang.
>>>> 
>>>> There's also the idea of introducing a jextract option to manually
>>>> override a descriptor, e.g. `--patch-descriptor
>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like that to
>>>> manually provide the layout descriptor for structs with flexible 
>>>> array
>>>> members in the meantime.
>>>> 
>>>> Jorn
>>>> 
>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>> instead of undefined layout, I prefer to have limited support, we 
>>>>>>> can either strip that field or generate a 0-length array for now. 
>>>>>>> >> For jextract, C only allow such field at end of struct, and sizeof() operator simply ignore that trailing array field. This should give us a good match as first step.
>>>>>> Agree
>>>>> 
>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>> libclang
>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>> changed
>>>>> to emit an exception in the first place.
>>>>> 
>>>>> It is not possible to filter out structs with any jextract option, 
>>>>> so
>>>>> if you have an incomplete array in a header file that is otherwise
>>>>> usable, you're out of luck.
>>>>> 
>>>>> Jorn
>>>>> 
>>>>> [1] :
>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>> instead of undefined layout, I prefer to have limited support, we 
>>>>>>> can either strip that field or generate a 0-length array for now. 
>>>>>>> For jextract, C only allow such field at end of struct, and 
>>>>>>> sizeof() operator simply ignore that trailing array field. This 
>>>>>>> should give us a good match as first step.
>>>>>> Agree
>>>>>>> 
>>>>>>> Move on to more general support, where incomplete array can be 
>>>>>>> in-between layouts. Before that, we probably need to validate 
>>>>>>> some assumption,
>>>>>>> 
>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>> struct before the incomplete array. I believe this will pretty 
>>>>>>> much cover most cases if not all.
>>>>>> 
>>>>>> To reinforce this point, I believe most compilers even give error 
>>>>>> if
>>>>>> the incomplete array is the only member of the struct, or if it's
>>>>>> followed by other stuff:
>>>>>> 
>>>>>> $ cat testInc.h
>>>>>> 
>>>>>> struct A {
>>>>>>    int arr[];
>>>>>> };
>>>>>> 
>>>>>> struct B {
>>>>>>    int l;
>>>>>>    int arr[];
>>>>>> };
>>>>>> 
>>>>>> struct C {
>>>>>>    int l;
>>>>>>    int arr[];
>>>>>>    int l2;
>>>>>> };
>>>>>> 
>>>>>> $ gcc -c testInc.h
>>>>>> 
>>>>>> testInc.h:2:8: error: flexible array member in a struct with no 
>>>>>> named members
>>>>>>     int arr[];
>>>>>>         ^~~
>>>>>> testInc.h:12:8: error: flexible array member not at end of struct
>>>>>>     int arr[];
>>>>>>         ^~~
>>>>>> 
>>>>>> $ clang -c testInc.h
>>>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed in 
>>>>>> otherwise empty
>>>>>>       struct
>>>>>>    int arr[];
>>>>>>        ^
>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 'int 
>>>>>> []' is not at
>>>>>>       the end of struct
>>>>>>    int arr[];
>>>>>>        ^
>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>    int l2;
>>>>>>        ^
>>>>>> 2 errors generated.
>>>>>> 
>>>>>>> 
>>>>>>> With that, I think following should work well enough,
>>>>>> 
>>>>>> What you describe is what I've dubbed 'dependent layout' approach 
>>>>>> -
>>>>>> e.g. have one or more values in a struct provide more info for 
>>>>>> certain
>>>>>> layout elements in same struct. This is fairly frequent business 
>>>>>> with
>>>>>> message protocols - almost all representation for variable-sized 
>>>>>> data
>>>>>> is expressed as length + data array (sometimes compressed, as in
>>>>>> protobuf's VarInt).
>>>>>> 
>>>>>> I agree that's where we need to land, longer term. Short term it 
>>>>>> feels
>>>>>> like the best move would be to just strip the array. Creating a
>>>>>> 0-length array might be a  move with subtle consequences: the 
>>>>>> array
>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>> region).
>>>>>> The boundaries of the enclosing region are usually computed using
>>>>>> sizeof(enclosing type). Meaning that the enclosing region won't be
>>>>>> 'big enough' to host anything but a zero-length array. If we cast 
>>>>>> the
>>>>>> array to something else, what we get back is an array whose 
>>>>>> boundaries
>>>>>> would exceed those of the enclosing struct - so if you try e.g. to
>>>>>> write to the array, the operation would fail.
>>>>>> 
>>>>>> To do this stuff properly in Panama you would need to allocate a
>>>>>> bigger chunk of memory, of the desired size (pretty much as you 
>>>>>> would
>>>>>> in C), and then cast the memory to the struct type - now you have 
>>>>>> a
>>>>>> memory region that is big enough to do the struct + the array.
>>>>>> 
>>>>>> The alternative, which does look simpler, is to just allocate a 
>>>>>> struct
>>>>>> (with array stripped) followed by an array of desired size in the 
>>>>>> same
>>>>>> scope - e.g. compare this:
>>>>>> 
>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>> 
>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>     Array<Integer> data = 
>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>> 
>>>>>> }
>>>>>> 
>>>>>> With this:
>>>>>> 
>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>> 
>>>>>>     StructWithoutIncompleteArray struct =
>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 10);
>>>>>> 
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> The code looks similar - and a client can, after the allocation, 
>>>>>> use
>>>>>> the array and the struct at will. But there's a subtle difference
>>>>>> between the two: in the first snippet, the array is allocated
>>>>>> immediately after the struct bits - that's how allocation 
>>>>>> happened. In
>>>>>> the second snippet there's no guarantee that the array will be 
>>>>>> after
>>>>>> the struct; in fact, the native scope might have run out of space 
>>>>>> with
>>>>>> the struct and needed to allocate a new slab of native memory via
>>>>>> unsafe before the array allocation happens.
>>>>>> 
>>>>>> 
>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>> problem,
>>>>>> even if more verbose, is the first one.
>>>>>> 
>>>>>> Maurizio