[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 01:24:23 UTC 2019

Thanks this looks much simpler.

And also points at another possible simplification - why don't we 
generate such warning in the parser? After all, is the parser that knows 
when a StructTree will be created, so it is in an ideal situation to 
intercept that event and log a warning accordingly (and, the parser 
already has logging support). Btw, I know that the actual StructTree 
creation occurs in TreeMaker - we could either add context to TreeMaker, 
or just peek for a struct cursor in Parser.

But overall, adding the check on earlier visitors might be better than 
using peek().

Maurizio

On 26/02/2019 00:30, Jorn Vernee wrote:
> Hi Maurizio,
>
> I think you're right. I replaced the exception with a warning since 
> the 2 are similar, but throwing an exception/warning at that point is 
> probably wrong in the first place, like you say.
>
> Based on your suggestion, I've implemented the warning separately as a 
> .peek() operation on the stream of Trees.
>
> Updated webrev: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>
> Jorn
>
> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>> Hi Jorn,
>> while in principle making context immutable is the right to do on
>> paper, I believe that we probably ended up curing a symptom and not
>> the disease :-)
>>
>> Which means: I find it very strange to see that a layout computer -
>> that is, a function which takes a clang Cursor and emits a Layout is
>> now suddenly starting to give side-effects. Given I've worked on javac
>> for many years, I can assure you that this will almost surely lead to
>> problems down the road. For instance, if this apparently harmless
>> stateless function is called twice on the same cursor, we would get
>> two warnings; another issue is that the code will fail to scale if new
>> warnings will need to be added - e.g. if record layout computer needs
>> to check for other problematic stuff. And I'm not even mentioning
>> adding logic to enable/disable specific classes of warnings, which is
>> another thing that might happen, at some point.
>>
>> What works in these contexts, is to separate the logic that does the
>> stateless computation (e.g. layout computation, in this case), from
>> the logic that issues the warning. For instance, we could have a
>> separate visitor in jextract which looks at the clang cursor, looking
>> for things that are suspicious or not well supported (such as
>> incomplete arrays) and report a warning _only once_.
>>
>> This visitor will probably write itself, and, longer term would be a
>> much more maintainable option than inlining the check in the layout
>> computer function itself.
>>
>> After which, the decision as to whether make the context mutable or
>> not is largely orthogonal - we could go there, although it's not clear
>> if the extra visitor I'm proposing would significantly switch the
>> balance one way or another (probably not, as we have plenty other
>> visitors which do logging).
>>
>> Maurizio
>>
>>
>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>> I've split this into 2 to try and make reviewing easier.
>>>
>>> Here's the part that makes Context immutable (this required quite a 
>>> bit of refactoring in Main):
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/
>>>
>>> Here's the original part that returns a layout reference when 
>>> encountering a type with a flexible array (applies on top of the 
>>> first one):
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/
>>>
>>> Hope that helps,
>>> Jorn
>>>
>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>> I talked a bit with Sundar;
>>>>
>>>> We can not create a global static method to return the Context since
>>>> there could be more than 1 Context, due to multiple instances of
>>>> Jextract being created through the ToolProvider interface. But, as
>>>> long as Context is Immutable, passing it down to other code should be
>>>> OK.
>>>>
>>>> I have made Context immutable by adding a Context.Builder, and
>>>> creating the Context in a separate method in Main (createContext).
>>>> Some places in that code were prematurely exiting and returning an
>>>> error code. This is replaced by throwing a custom exception instead.
>>>>
>>>> Update webrev:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/
>>>>
>>>> Thanks,
>>>> Jorn
>>>>
>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>> In other words; we can't get the layout for a type with a flexible
>>>>> array from libclang at all. A good way to resolve that is probably to
>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>> searching, and there doesn't seem to be a bug report in their bug
>>>>> database for it either:
>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array so we
>>>>> could submit a bug about it as well. But, currently you need to
>>>>> manually request an account in order to post bugs, so this might take
>>>>> a while. I've also send an email just now to the cfe-dev mailing 
>>>>> list,
>>>>> maybe someone there can offer some help.
>>>>>
>>>>> In the meantime we can work around structs with flexible arrays.
>>>>> Currently an exception is thrown, but this makes the whole extracted
>>>>> interface unusable, so emitting an undefined layout as a default 
>>>>> value
>>>>> and printing a warning seems like a better option to me until the
>>>>> issue is fixed in libclang.
>>>>>
>>>>> There's also the idea of introducing a jextract option to manually
>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like that to
>>>>> manually provide the layout descriptor for structs with flexible 
>>>>> array
>>>>> members in the meantime.
>>>>>
>>>>> Jorn
>>>>>
>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>> instead of undefined layout, I prefer to have limited support, 
>>>>>>>> we can either strip that field or generate a 0-length array for 
>>>>>>>> now. >> For jextract, C only allow such field at end of struct, 
>>>>>>>> and sizeof() operator simply ignore that trailing array field. 
>>>>>>>> This should give us a good match as first step.
>>>>>>> Agree
>>>>>>
>>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>>> libclang
>>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>>> changed
>>>>>> to emit an exception in the first place.
>>>>>>
>>>>>> It is not possible to filter out structs with any jextract 
>>>>>> option, so
>>>>>> if you have an incomplete array in a header file that is otherwise
>>>>>> usable, you're out of luck.
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> [1] :
>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>> instead of undefined layout, I prefer to have limited support, 
>>>>>>>> we can either strip that field or generate a 0-length array for 
>>>>>>>> now. For jextract, C only allow such field at end of struct, 
>>>>>>>> and sizeof() operator simply ignore that trailing array field. 
>>>>>>>> This should give us a good match as first step.
>>>>>>> Agree
>>>>>>>>
>>>>>>>> Move on to more general support, where incomplete array can be 
>>>>>>>> in-between layouts. Before that, we probably need to validate 
>>>>>>>> some assumption,
>>>>>>>>
>>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>>> struct before the incomplete array. I believe this will pretty 
>>>>>>>> much cover most cases if not all.
>>>>>>>
>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>> error if
>>>>>>> the incomplete array is the only member of the struct, or if it's
>>>>>>> followed by other stuff:
>>>>>>>
>>>>>>> $ cat testInc.h
>>>>>>>
>>>>>>> struct A {
>>>>>>>    int arr[];
>>>>>>> };
>>>>>>>
>>>>>>> struct B {
>>>>>>>    int l;
>>>>>>>    int arr[];
>>>>>>> };
>>>>>>>
>>>>>>> struct C {
>>>>>>>    int l;
>>>>>>>    int arr[];
>>>>>>>    int l2;
>>>>>>> };
>>>>>>>
>>>>>>> $ gcc -c testInc.h
>>>>>>>
>>>>>>> testInc.h:2:8: error: flexible array member in a struct with no 
>>>>>>> named members
>>>>>>>     int arr[];
>>>>>>>         ^~~
>>>>>>> testInc.h:12:8: error: flexible array member not at end of struct
>>>>>>>     int arr[];
>>>>>>>         ^~~
>>>>>>>
>>>>>>> $ clang -c testInc.h
>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed in 
>>>>>>> otherwise empty
>>>>>>>       struct
>>>>>>>    int arr[];
>>>>>>>        ^
>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 
>>>>>>> 'int []' is not at
>>>>>>>       the end of struct
>>>>>>>    int arr[];
>>>>>>>        ^
>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>    int l2;
>>>>>>>        ^
>>>>>>> 2 errors generated.
>>>>>>>
>>>>>>>>
>>>>>>>> With that, I think following should work well enough,
>>>>>>>
>>>>>>> What you describe is what I've dubbed 'dependent layout' approach -
>>>>>>> e.g. have one or more values in a struct provide more info for 
>>>>>>> certain
>>>>>>> layout elements in same struct. This is fairly frequent business 
>>>>>>> with
>>>>>>> message protocols - almost all representation for variable-sized 
>>>>>>> data
>>>>>>> is expressed as length + data array (sometimes compressed, as in
>>>>>>> protobuf's VarInt).
>>>>>>>
>>>>>>> I agree that's where we need to land, longer term. Short term it 
>>>>>>> feels
>>>>>>> like the best move would be to just strip the array. Creating a
>>>>>>> 0-length array might be a  move with subtle consequences: the array
>>>>>>> occurs within a region with some boundaries (e.g. a struct region).
>>>>>>> The boundaries of the enclosing region are usually computed using
>>>>>>> sizeof(enclosing type). Meaning that the enclosing region won't be
>>>>>>> 'big enough' to host anything but a zero-length array. If we 
>>>>>>> cast the
>>>>>>> array to something else, what we get back is an array whose 
>>>>>>> boundaries
>>>>>>> would exceed those of the enclosing struct - so if you try e.g. to
>>>>>>> write to the array, the operation would fail.
>>>>>>>
>>>>>>> To do this stuff properly in Panama you would need to allocate a
>>>>>>> bigger chunk of memory, of the desired size (pretty much as you 
>>>>>>> would
>>>>>>> in C), and then cast the memory to the struct type - now you have a
>>>>>>> memory region that is big enough to do the struct + the array.
>>>>>>>
>>>>>>> The alternative, which does look simpler, is to just allocate a 
>>>>>>> struct
>>>>>>> (with array stripped) followed by an array of desired size in 
>>>>>>> the same
>>>>>>> scope - e.g. compare this:
>>>>>>>
>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>
>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>>     Array<Integer> data = 
>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> With this:
>>>>>>>
>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>
>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 10);
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> The code looks similar - and a client can, after the allocation, 
>>>>>>> use
>>>>>>> the array and the struct at will. But there's a subtle difference
>>>>>>> between the two: in the first snippet, the array is allocated
>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>> happened. In
>>>>>>> the second snippet there's no guarantee that the array will be 
>>>>>>> after
>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>> space with
>>>>>>> the struct and needed to allocate a new slab of native memory via
>>>>>>> unsafe before the array allocation happens.
>>>>>>>
>>>>>>>
>>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>>> problem,
>>>>>>> even if more verbose, is the first one.
>>>>>>>
>>>>>>> Maurizio