[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 15:10:51 UTC 2019

On 26/02/2019 15:02, Jorn Vernee wrote:
> So, it seemed good to put them somewhere in the middle, e.g. Utils
>
sorry, I saw the code moving and did not notice it was still used. Good.

Maurizio

> Jorn
>
> Jorn Vernee schreef op 2019-02-26 16:00:
>> Well, right now they are used in 2 places; in RecordLayoutComputer,
>> because we still need to detect a flexible array and emit a layout
>> reference, and in the warning visitor.
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-02-26 15:56:
>>> One minor comment - don't the methods you moved from
>>> RecordLayoutComputer to Utils actually belong to the new visitor?
>>>
>>> Maurizio
>>>
>>> On 26/02/2019 14:53, Maurizio Cimadamore wrote:
>>>> Looks good - in the future, should we add more checks we can rename 
>>>> the visitor to some more general name.
>>>>
>>>> Maurizio
>>>>
>>>> On 26/02/2019 14:33, Jorn Vernee wrote:
>>>>> Uhh... I just realized the MultiplexingVisitor I made is totally 
>>>>> pointless... because Trees implement the visitor pattern. 
>>>>> (probably time to take a break :-) )
>>>>>
>>>>> Updated webrev: 
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.06/
>>>>>
>>>>> Jorn
>>>>>
>>>>> Jorn Vernee schreef op 2019-02-26 14:20:
>>>>>> After some private discussion, here is the updated webrev that that
>>>>>> implements the check as an actual visitor instead.
>>>>>>
>>>>>> Updated webrev:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.05/ 
>>>>>>
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-02-26 02:24:
>>>>>>> Thanks this looks much simpler.
>>>>>>>
>>>>>>> And also points at another possible simplification - why don't we
>>>>>>> generate such warning in the parser? After all, is the parser that
>>>>>>> knows when a StructTree will be created, so it is in an ideal
>>>>>>> situation to intercept that event and log a warning accordingly 
>>>>>>> (and,
>>>>>>> the parser already has logging support). Btw, I know that the 
>>>>>>> actual
>>>>>>> StructTree creation occurs in TreeMaker - we could either add 
>>>>>>> context
>>>>>>> to TreeMaker, or just peek for a struct cursor in Parser.
>>>>>>>
>>>>>>> But overall, adding the check on earlier visitors might be 
>>>>>>> better than
>>>>>>> using peek().
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>> On 26/02/2019 00:30, Jorn Vernee wrote:
>>>>>>>> Hi Maurizio,
>>>>>>>>
>>>>>>>> I think you're right. I replaced the exception with a warning 
>>>>>>>> since the 2 are similar, but throwing an exception/warning at 
>>>>>>>> that point is probably wrong in the first place, like you say.
>>>>>>>>
>>>>>>>> Based on your suggestion, I've implemented the warning 
>>>>>>>> separately as a .peek() operation on the stream of Trees.
>>>>>>>>
>>>>>>>> Updated webrev: 
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>>>>>> Hi Jorn,
>>>>>>>>> while in principle making context immutable is the right to do on
>>>>>>>>> paper, I believe that we probably ended up curing a symptom 
>>>>>>>>> and not
>>>>>>>>> the disease :-)
>>>>>>>>>
>>>>>>>>> Which means: I find it very strange to see that a layout 
>>>>>>>>> computer -
>>>>>>>>> that is, a function which takes a clang Cursor and emits a 
>>>>>>>>> Layout is
>>>>>>>>> now suddenly starting to give side-effects. Given I've worked 
>>>>>>>>> on javac
>>>>>>>>> for many years, I can assure you that this will almost surely 
>>>>>>>>> lead to
>>>>>>>>> problems down the road. For instance, if this apparently harmless
>>>>>>>>> stateless function is called twice on the same cursor, we 
>>>>>>>>> would get
>>>>>>>>> two warnings; another issue is that the code will fail to 
>>>>>>>>> scale if new
>>>>>>>>> warnings will need to be added - e.g. if record layout 
>>>>>>>>> computer needs
>>>>>>>>> to check for other problematic stuff. And I'm not even mentioning
>>>>>>>>> adding logic to enable/disable specific classes of warnings, 
>>>>>>>>> which is
>>>>>>>>> another thing that might happen, at some point.
>>>>>>>>>
>>>>>>>>> What works in these contexts, is to separate the logic that 
>>>>>>>>> does the
>>>>>>>>> stateless computation (e.g. layout computation, in this case), 
>>>>>>>>> from
>>>>>>>>> the logic that issues the warning. For instance, we could have a
>>>>>>>>> separate visitor in jextract which looks at the clang cursor, 
>>>>>>>>> looking
>>>>>>>>> for things that are suspicious or not well supported (such as
>>>>>>>>> incomplete arrays) and report a warning _only once_.
>>>>>>>>>
>>>>>>>>> This visitor will probably write itself, and, longer term 
>>>>>>>>> would be a
>>>>>>>>> much more maintainable option than inlining the check in the 
>>>>>>>>> layout
>>>>>>>>> computer function itself.
>>>>>>>>>
>>>>>>>>> After which, the decision as to whether make the context 
>>>>>>>>> mutable or
>>>>>>>>> not is largely orthogonal - we could go there, although it's 
>>>>>>>>> not clear
>>>>>>>>> if the extra visitor I'm proposing would significantly switch the
>>>>>>>>> balance one way or another (probably not, as we have plenty other
>>>>>>>>> visitors which do logging).
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>>>>>
>>>>>>>>>> Here's the part that makes Context immutable (this required 
>>>>>>>>>> quite a bit of refactoring in Main):
>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/ 
>>>>>>>>>> Here's the original part that returns a layout reference when 
>>>>>>>>>> encountering a type with a flexible array (applies on top of 
>>>>>>>>>> the first one):
>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/ 
>>>>>>>>>> Hope that helps,
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>>>>>> I talked a bit with Sundar;
>>>>>>>>>>>
>>>>>>>>>>> We can not create a global static method to return the 
>>>>>>>>>>> Context since
>>>>>>>>>>> there could be more than 1 Context, due to multiple 
>>>>>>>>>>> instances of
>>>>>>>>>>> Jextract being created through the ToolProvider interface. 
>>>>>>>>>>> But, as
>>>>>>>>>>> long as Context is Immutable, passing it down to other code 
>>>>>>>>>>> should be
>>>>>>>>>>> OK.
>>>>>>>>>>>
>>>>>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>>>>>> creating the Context in a separate method in Main 
>>>>>>>>>>> (createContext).
>>>>>>>>>>> Some places in that code were prematurely exiting and 
>>>>>>>>>>> returning an
>>>>>>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> Update webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/ 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>>>>>> In other words; we can't get the layout for a type with a 
>>>>>>>>>>>> flexible
>>>>>>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>>>>>>> probably to
>>>>>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>>>>>> searching, and there doesn't seem to be a bug report in 
>>>>>>>>>>>> their bug
>>>>>>>>>>>> database for it either:
>>>>>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array 
>>>>>>>>>>>> so we
>>>>>>>>>>>> could submit a bug about it as well. But, currently you 
>>>>>>>>>>>> need to
>>>>>>>>>>>> manually request an account in order to post bugs, so this 
>>>>>>>>>>>> might take
>>>>>>>>>>>> a while. I've also send an email just now to the cfe-dev 
>>>>>>>>>>>> mailing list,
>>>>>>>>>>>> maybe someone there can offer some help.
>>>>>>>>>>>>
>>>>>>>>>>>> In the meantime we can work around structs with flexible 
>>>>>>>>>>>> arrays.
>>>>>>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>>>>>>> extracted
>>>>>>>>>>>> interface unusable, so emitting an undefined layout as a 
>>>>>>>>>>>> default value
>>>>>>>>>>>> and printing a warning seems like a better option to me 
>>>>>>>>>>>> until the
>>>>>>>>>>>> issue is fixed in libclang.
>>>>>>>>>>>>
>>>>>>>>>>>> There's also the idea of introducing a jextract option to 
>>>>>>>>>>>> manually
>>>>>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something 
>>>>>>>>>>>> like that to
>>>>>>>>>>>> manually provide the layout descriptor for structs with 
>>>>>>>>>>>> flexible array
>>>>>>>>>>>> members in the meantime.
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>>>>>> Now to the real discussion about incomplete array 
>>>>>>>>>>>>>>> support, instead of undefined layout, I prefer to have 
>>>>>>>>>>>>>>> limited support, we can either strip that field or 
>>>>>>>>>>>>>>> generate a 0-length array for now. >> For jextract, C 
>>>>>>>>>>>>>>> only allow such field at end of struct, and sizeof() 
>>>>>>>>>>>>>>> operator simply ignore that trailing array field. This 
>>>>>>>>>>>>>>> should give us a good match as first step.
>>>>>>>>>>>>>> Agree
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sure, that would be preferable. But, as discussed before 
>>>>>>>>>>>>> [1], libclang
>>>>>>>>>>>>> does not handle incomplete arrays properly, that's why it 
>>>>>>>>>>>>> was changed
>>>>>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>>>>>>> option, so
>>>>>>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>>>>>>> otherwise
>>>>>>>>>>>>> usable, you're out of luck.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] :
>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>>>>>> Now to the real discussion about incomplete array 
>>>>>>>>>>>>>>> support, instead of undefined layout, I prefer to have 
>>>>>>>>>>>>>>> limited support, we can either strip that field or 
>>>>>>>>>>>>>>> generate a 0-length array for now. For jextract, C only 
>>>>>>>>>>>>>>> allow such field at end of struct, and sizeof() operator 
>>>>>>>>>>>>>>> simply ignore that trailing array field. This should 
>>>>>>>>>>>>>>> give us a good match as first step.
>>>>>>>>>>>>>> Agree
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Move on to more general support, where incomplete array 
>>>>>>>>>>>>>>> can be in-between layouts. Before that, we probably need 
>>>>>>>>>>>>>>> to validate some assumption,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any incomplete array must have length specified in the 
>>>>>>>>>>>>>>> same struct before the incomplete array. I believe this 
>>>>>>>>>>>>>>> will pretty much cover most cases if not all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To reinforce this point, I believe most compilers even 
>>>>>>>>>>>>>> give error if
>>>>>>>>>>>>>> the incomplete array is the only member of the struct, or 
>>>>>>>>>>>>>> if it's
>>>>>>>>>>>>>> followed by other stuff:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct A {
>>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct B {
>>>>>>>>>>>>>>    int l;
>>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct C {
>>>>>>>>>>>>>>    int l;
>>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>>    int l2;
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct 
>>>>>>>>>>>>>> with no named members
>>>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>>>         ^~~
>>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end 
>>>>>>>>>>>>>> of struct
>>>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>>>         ^~~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not 
>>>>>>>>>>>>>> allowed in otherwise empty
>>>>>>>>>>>>>>       struct
>>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>>        ^
>>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with 
>>>>>>>>>>>>>> type 'int []' is not at
>>>>>>>>>>>>>>       the end of struct
>>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>>        ^
>>>>>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>>>>>    int l2;
>>>>>>>>>>>>>>        ^
>>>>>>>>>>>>>> 2 errors generated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>>>>>>> approach -
>>>>>>>>>>>>>> e.g. have one or more values in a struct provide more 
>>>>>>>>>>>>>> info for certain
>>>>>>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>>>>>>> business with
>>>>>>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>>>>>>> variable-sized data
>>>>>>>>>>>>>> is expressed as length + data array (sometimes 
>>>>>>>>>>>>>> compressed, as in
>>>>>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree that's where we need to land, longer term. Short 
>>>>>>>>>>>>>> term it feels
>>>>>>>>>>>>>> like the best move would be to just strip the array. 
>>>>>>>>>>>>>> Creating a
>>>>>>>>>>>>>> 0-length array might be a  move with subtle consequences: 
>>>>>>>>>>>>>> the array
>>>>>>>>>>>>>> occurs within a region with some boundaries (e.g. a 
>>>>>>>>>>>>>> struct region).
>>>>>>>>>>>>>> The boundaries of the enclosing region are usually 
>>>>>>>>>>>>>> computed using
>>>>>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region 
>>>>>>>>>>>>>> won't be
>>>>>>>>>>>>>> 'big enough' to host anything but a zero-length array. If 
>>>>>>>>>>>>>> we cast the
>>>>>>>>>>>>>> array to something else, what we get back is an array 
>>>>>>>>>>>>>> whose boundaries
>>>>>>>>>>>>>> would exceed those of the enclosing struct - so if you 
>>>>>>>>>>>>>> try e.g. to
>>>>>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To do this stuff properly in Panama you would need to 
>>>>>>>>>>>>>> allocate a
>>>>>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much 
>>>>>>>>>>>>>> as you would
>>>>>>>>>>>>>> in C), and then cast the memory to the struct type - now 
>>>>>>>>>>>>>> you have a
>>>>>>>>>>>>>> memory region that is big enough to do the struct + the 
>>>>>>>>>>>>>> array.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The alternative, which does look simpler, is to just 
>>>>>>>>>>>>>> allocate a struct
>>>>>>>>>>>>>> (with array stripped) followed by an array of desired 
>>>>>>>>>>>>>> size in the same
>>>>>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>>>>> Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class)); 
>>>>>>>>>>>>>> Array<Integer> data = 
>>>>>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>>>>>> s.allocateArray(NativeTypes.INT32, 10);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>>>>>>> allocation, use
>>>>>>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>>>>>>> difference
>>>>>>>>>>>>>> between the two: in the first snippet, the array is 
>>>>>>>>>>>>>> allocated
>>>>>>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>>>>>>> happened. In
>>>>>>>>>>>>>> the second snippet there's no guarantee that the array 
>>>>>>>>>>>>>> will be after
>>>>>>>>>>>>>> the struct; in fact, the native scope might have run out 
>>>>>>>>>>>>>> of space with
>>>>>>>>>>>>>> the struct and needed to allocate a new slab of native 
>>>>>>>>>>>>>> memory via
>>>>>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which seems to suggest that the right way of approaching 
>>>>>>>>>>>>>> the problem,
>>>>>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio