[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 14:53:03 UTC 2019

Looks good - in the future, should we add more checks we can rename the 
visitor to some more general name.

Maurizio

On 26/02/2019 14:33, Jorn Vernee wrote:
> Uhh... I just realized the MultiplexingVisitor I made is totally 
> pointless... because Trees implement the visitor pattern. (probably 
> time to take a break :-) )
>
> Updated webrev: 
> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.06/
>
> Jorn
>
> Jorn Vernee schreef op 2019-02-26 14:20:
>> After some private discussion, here is the updated webrev that that
>> implements the check as an actual visitor instead.
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.05/
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-02-26 02:24:
>>> Thanks this looks much simpler.
>>>
>>> And also points at another possible simplification - why don't we
>>> generate such warning in the parser? After all, is the parser that
>>> knows when a StructTree will be created, so it is in an ideal
>>> situation to intercept that event and log a warning accordingly (and,
>>> the parser already has logging support). Btw, I know that the actual
>>> StructTree creation occurs in TreeMaker - we could either add context
>>> to TreeMaker, or just peek for a struct cursor in Parser.
>>>
>>> But overall, adding the check on earlier visitors might be better than
>>> using peek().
>>>
>>> Maurizio
>>>
>>> On 26/02/2019 00:30, Jorn Vernee wrote:
>>>> Hi Maurizio,
>>>>
>>>> I think you're right. I replaced the exception with a warning since 
>>>> the 2 are similar, but throwing an exception/warning at that point 
>>>> is probably wrong in the first place, like you say.
>>>>
>>>> Based on your suggestion, I've implemented the warning separately 
>>>> as a .peek() operation on the stream of Trees.
>>>>
>>>> Updated webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>>>
>>>> Jorn
>>>>
>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>> Hi Jorn,
>>>>> while in principle making context immutable is the right to do on
>>>>> paper, I believe that we probably ended up curing a symptom and not
>>>>> the disease :-)
>>>>>
>>>>> Which means: I find it very strange to see that a layout computer -
>>>>> that is, a function which takes a clang Cursor and emits a Layout is
>>>>> now suddenly starting to give side-effects. Given I've worked on 
>>>>> javac
>>>>> for many years, I can assure you that this will almost surely lead to
>>>>> problems down the road. For instance, if this apparently harmless
>>>>> stateless function is called twice on the same cursor, we would get
>>>>> two warnings; another issue is that the code will fail to scale if 
>>>>> new
>>>>> warnings will need to be added - e.g. if record layout computer needs
>>>>> to check for other problematic stuff. And I'm not even mentioning
>>>>> adding logic to enable/disable specific classes of warnings, which is
>>>>> another thing that might happen, at some point.
>>>>>
>>>>> What works in these contexts, is to separate the logic that does the
>>>>> stateless computation (e.g. layout computation, in this case), from
>>>>> the logic that issues the warning. For instance, we could have a
>>>>> separate visitor in jextract which looks at the clang cursor, looking
>>>>> for things that are suspicious or not well supported (such as
>>>>> incomplete arrays) and report a warning _only once_.
>>>>>
>>>>> This visitor will probably write itself, and, longer term would be a
>>>>> much more maintainable option than inlining the check in the layout
>>>>> computer function itself.
>>>>>
>>>>> After which, the decision as to whether make the context mutable or
>>>>> not is largely orthogonal - we could go there, although it's not 
>>>>> clear
>>>>> if the extra visitor I'm proposing would significantly switch the
>>>>> balance one way or another (probably not, as we have plenty other
>>>>> visitors which do logging).
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>
>>>>>> Here's the part that makes Context immutable (this required quite 
>>>>>> a bit of refactoring in Main):
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/ 
>>>>>>
>>>>>>
>>>>>> Here's the original part that returns a layout reference when 
>>>>>> encountering a type with a flexible array (applies on top of the 
>>>>>> first one):
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/ 
>>>>>>
>>>>>>
>>>>>> Hope that helps,
>>>>>> Jorn
>>>>>>
>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>> I talked a bit with Sundar;
>>>>>>>
>>>>>>> We can not create a global static method to return the Context 
>>>>>>> since
>>>>>>> there could be more than 1 Context, due to multiple instances of
>>>>>>> Jextract being created through the ToolProvider interface. But, as
>>>>>>> long as Context is Immutable, passing it down to other code 
>>>>>>> should be
>>>>>>> OK.
>>>>>>>
>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>> creating the Context in a separate method in Main (createContext).
>>>>>>> Some places in that code were prematurely exiting and returning an
>>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>>> instead.
>>>>>>>
>>>>>>> Update webrev:
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/ 
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jorn
>>>>>>>
>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>> In other words; we can't get the layout for a type with a flexible
>>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>>> probably to
>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>> searching, and there doesn't seem to be a bug report in their bug
>>>>>>>> database for it either:
>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array 
>>>>>>>> so we
>>>>>>>> could submit a bug about it as well. But, currently you need to
>>>>>>>> manually request an account in order to post bugs, so this 
>>>>>>>> might take
>>>>>>>> a while. I've also send an email just now to the cfe-dev 
>>>>>>>> mailing list,
>>>>>>>> maybe someone there can offer some help.
>>>>>>>>
>>>>>>>> In the meantime we can work around structs with flexible arrays.
>>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>>> extracted
>>>>>>>> interface unusable, so emitting an undefined layout as a 
>>>>>>>> default value
>>>>>>>> and printing a warning seems like a better option to me until the
>>>>>>>> issue is fixed in libclang.
>>>>>>>>
>>>>>>>> There's also the idea of introducing a jextract option to manually
>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like 
>>>>>>>> that to
>>>>>>>> manually provide the layout descriptor for structs with 
>>>>>>>> flexible array
>>>>>>>> members in the meantime.
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>> 0-length array for now. >> For jextract, C only allow such 
>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>> as first step.
>>>>>>>>>> Agree
>>>>>>>>>
>>>>>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>>>>>> libclang
>>>>>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>>>>>> changed
>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>
>>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>>> option, so
>>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>>> otherwise
>>>>>>>>> usable, you're out of luck.
>>>>>>>>>
>>>>>>>>> Jorn
>>>>>>>>>
>>>>>>>>> [1] :
>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>> 0-length array for now. For jextract, C only allow such 
>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>> as first step.
>>>>>>>>>> Agree
>>>>>>>>>>>
>>>>>>>>>>> Move on to more general support, where incomplete array can 
>>>>>>>>>>> be in-between layouts. Before that, we probably need to 
>>>>>>>>>>> validate some assumption,
>>>>>>>>>>>
>>>>>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>>>>>> struct before the incomplete array. I believe this will 
>>>>>>>>>>> pretty much cover most cases if not all.
>>>>>>>>>>
>>>>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>>>>> error if
>>>>>>>>>> the incomplete array is the only member of the struct, or if 
>>>>>>>>>> it's
>>>>>>>>>> followed by other stuff:
>>>>>>>>>>
>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>
>>>>>>>>>> struct A {
>>>>>>>>>>    int arr[];
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> struct B {
>>>>>>>>>>    int l;
>>>>>>>>>>    int arr[];
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> struct C {
>>>>>>>>>>    int l;
>>>>>>>>>>    int arr[];
>>>>>>>>>>    int l2;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>
>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct with 
>>>>>>>>>> no named members
>>>>>>>>>>     int arr[];
>>>>>>>>>>         ^~~
>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end of 
>>>>>>>>>> struct
>>>>>>>>>>     int arr[];
>>>>>>>>>>         ^~~
>>>>>>>>>>
>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed 
>>>>>>>>>> in otherwise empty
>>>>>>>>>>       struct
>>>>>>>>>>    int arr[];
>>>>>>>>>>        ^
>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 
>>>>>>>>>> 'int []' is not at
>>>>>>>>>>       the end of struct
>>>>>>>>>>    int arr[];
>>>>>>>>>>        ^
>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>    int l2;
>>>>>>>>>>        ^
>>>>>>>>>> 2 errors generated.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>
>>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>>> approach -
>>>>>>>>>> e.g. have one or more values in a struct provide more info 
>>>>>>>>>> for certain
>>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>>> business with
>>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>>> variable-sized data
>>>>>>>>>> is expressed as length + data array (sometimes compressed, as in
>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>
>>>>>>>>>> I agree that's where we need to land, longer term. Short term 
>>>>>>>>>> it feels
>>>>>>>>>> like the best move would be to just strip the array. Creating a
>>>>>>>>>> 0-length array might be a  move with subtle consequences: the 
>>>>>>>>>> array
>>>>>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>>>>>> region).
>>>>>>>>>> The boundaries of the enclosing region are usually computed 
>>>>>>>>>> using
>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region 
>>>>>>>>>> won't be
>>>>>>>>>> 'big enough' to host anything but a zero-length array. If we 
>>>>>>>>>> cast the
>>>>>>>>>> array to something else, what we get back is an array whose 
>>>>>>>>>> boundaries
>>>>>>>>>> would exceed those of the enclosing struct - so if you try 
>>>>>>>>>> e.g. to
>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>
>>>>>>>>>> To do this stuff properly in Panama you would need to allocate a
>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much as 
>>>>>>>>>> you would
>>>>>>>>>> in C), and then cast the memory to the struct type - now you 
>>>>>>>>>> have a
>>>>>>>>>> memory region that is big enough to do the struct + the array.
>>>>>>>>>>
>>>>>>>>>> The alternative, which does look simpler, is to just allocate 
>>>>>>>>>> a struct
>>>>>>>>>> (with array stripped) followed by an array of desired size in 
>>>>>>>>>> the same
>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>
>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>
>>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> With this:
>>>>>>>>>>
>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>
>>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 
>>>>>>>>>> 10);
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>>> allocation, use
>>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>>> difference
>>>>>>>>>> between the two: in the first snippet, the array is allocated
>>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>>> happened. In
>>>>>>>>>> the second snippet there's no guarantee that the array will 
>>>>>>>>>> be after
>>>>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>>>>> space with
>>>>>>>>>> the struct and needed to allocate a new slab of native memory 
>>>>>>>>>> via
>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>>>>>> problem,
>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>
>>>>>>>>>> Maurizio