[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Tue Feb 26 15:02:17 UTC 2019

So, it seemed good to put them somewhere in the middle, e.g. Utils

Jorn

Jorn Vernee schreef op 2019-02-26 16:00:
> Well, right now they are used in 2 places; in RecordLayoutComputer,
> because we still need to detect a flexible array and emit a layout
> reference, and in the warning visitor.
> 
> Jorn
> 
> Maurizio Cimadamore schreef op 2019-02-26 15:56:
>> One minor comment - don't the methods you moved from
>> RecordLayoutComputer to Utils actually belong to the new visitor?
>> 
>> Maurizio
>> 
>> On 26/02/2019 14:53, Maurizio Cimadamore wrote:
>>> Looks good - in the future, should we add more checks we can rename 
>>> the visitor to some more general name.
>>> 
>>> Maurizio
>>> 
>>> On 26/02/2019 14:33, Jorn Vernee wrote:
>>>> Uhh... I just realized the MultiplexingVisitor I made is totally 
>>>> pointless... because Trees implement the visitor pattern. (probably 
>>>> time to take a break :-) )
>>>> 
>>>> Updated webrev: 
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.06/
>>>> 
>>>> Jorn
>>>> 
>>>> Jorn Vernee schreef op 2019-02-26 14:20:
>>>>> After some private discussion, here is the updated webrev that that
>>>>> implements the check as an actual visitor instead.
>>>>> 
>>>>> Updated webrev:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.05/
>>>>> 
>>>>> Jorn
>>>>> 
>>>>> Maurizio Cimadamore schreef op 2019-02-26 02:24:
>>>>>> Thanks this looks much simpler.
>>>>>> 
>>>>>> And also points at another possible simplification - why don't we
>>>>>> generate such warning in the parser? After all, is the parser that
>>>>>> knows when a StructTree will be created, so it is in an ideal
>>>>>> situation to intercept that event and log a warning accordingly 
>>>>>> (and,
>>>>>> the parser already has logging support). Btw, I know that the 
>>>>>> actual
>>>>>> StructTree creation occurs in TreeMaker - we could either add 
>>>>>> context
>>>>>> to TreeMaker, or just peek for a struct cursor in Parser.
>>>>>> 
>>>>>> But overall, adding the check on earlier visitors might be better 
>>>>>> than
>>>>>> using peek().
>>>>>> 
>>>>>> Maurizio
>>>>>> 
>>>>>> On 26/02/2019 00:30, Jorn Vernee wrote:
>>>>>>> Hi Maurizio,
>>>>>>> 
>>>>>>> I think you're right. I replaced the exception with a warning 
>>>>>>> since the 2 are similar, but throwing an exception/warning at 
>>>>>>> that point is probably wrong in the first place, like you say.
>>>>>>> 
>>>>>>> Based on your suggestion, I've implemented the warning separately 
>>>>>>> as a .peek() operation on the stream of Trees.
>>>>>>> 
>>>>>>> Updated webrev: 
>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>>>>>> 
>>>>>>> Jorn
>>>>>>> 
>>>>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>>>>> Hi Jorn,
>>>>>>>> while in principle making context immutable is the right to do 
>>>>>>>> on
>>>>>>>> paper, I believe that we probably ended up curing a symptom and 
>>>>>>>> not
>>>>>>>> the disease :-)
>>>>>>>> 
>>>>>>>> Which means: I find it very strange to see that a layout 
>>>>>>>> computer -
>>>>>>>> that is, a function which takes a clang Cursor and emits a 
>>>>>>>> Layout is
>>>>>>>> now suddenly starting to give side-effects. Given I've worked on 
>>>>>>>> javac
>>>>>>>> for many years, I can assure you that this will almost surely 
>>>>>>>> lead to
>>>>>>>> problems down the road. For instance, if this apparently 
>>>>>>>> harmless
>>>>>>>> stateless function is called twice on the same cursor, we would 
>>>>>>>> get
>>>>>>>> two warnings; another issue is that the code will fail to scale 
>>>>>>>> if new
>>>>>>>> warnings will need to be added - e.g. if record layout computer 
>>>>>>>> needs
>>>>>>>> to check for other problematic stuff. And I'm not even 
>>>>>>>> mentioning
>>>>>>>> adding logic to enable/disable specific classes of warnings, 
>>>>>>>> which is
>>>>>>>> another thing that might happen, at some point.
>>>>>>>> 
>>>>>>>> What works in these contexts, is to separate the logic that does 
>>>>>>>> the
>>>>>>>> stateless computation (e.g. layout computation, in this case), 
>>>>>>>> from
>>>>>>>> the logic that issues the warning. For instance, we could have a
>>>>>>>> separate visitor in jextract which looks at the clang cursor, 
>>>>>>>> looking
>>>>>>>> for things that are suspicious or not well supported (such as
>>>>>>>> incomplete arrays) and report a warning _only once_.
>>>>>>>> 
>>>>>>>> This visitor will probably write itself, and, longer term would 
>>>>>>>> be a
>>>>>>>> much more maintainable option than inlining the check in the 
>>>>>>>> layout
>>>>>>>> computer function itself.
>>>>>>>> 
>>>>>>>> After which, the decision as to whether make the context mutable 
>>>>>>>> or
>>>>>>>> not is largely orthogonal - we could go there, although it's not 
>>>>>>>> clear
>>>>>>>> if the extra visitor I'm proposing would significantly switch 
>>>>>>>> the
>>>>>>>> balance one way or another (probably not, as we have plenty 
>>>>>>>> other
>>>>>>>> visitors which do logging).
>>>>>>>> 
>>>>>>>> Maurizio
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>>>> 
>>>>>>>>> Here's the part that makes Context immutable (this required 
>>>>>>>>> quite a bit of refactoring in Main):
>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/ 
>>>>>>>>> Here's the original part that returns a layout reference when 
>>>>>>>>> encountering a type with a flexible array (applies on top of 
>>>>>>>>> the first one):
>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/ 
>>>>>>>>> Hope that helps,
>>>>>>>>> Jorn
>>>>>>>>> 
>>>>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>>>>> I talked a bit with Sundar;
>>>>>>>>>> 
>>>>>>>>>> We can not create a global static method to return the Context 
>>>>>>>>>> since
>>>>>>>>>> there could be more than 1 Context, due to multiple instances 
>>>>>>>>>> of
>>>>>>>>>> Jextract being created through the ToolProvider interface. 
>>>>>>>>>> But, as
>>>>>>>>>> long as Context is Immutable, passing it down to other code 
>>>>>>>>>> should be
>>>>>>>>>> OK.
>>>>>>>>>> 
>>>>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>>>>> creating the Context in a separate method in Main 
>>>>>>>>>> (createContext).
>>>>>>>>>> Some places in that code were prematurely exiting and 
>>>>>>>>>> returning an
>>>>>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>>>>>> instead.
>>>>>>>>>> 
>>>>>>>>>> Update webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/ 
>>>>>>>>>> Thanks,
>>>>>>>>>> Jorn
>>>>>>>>>> 
>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>>>>> In other words; we can't get the layout for a type with a 
>>>>>>>>>>> flexible
>>>>>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>>>>>> probably to
>>>>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>>>>> searching, and there doesn't seem to be a bug report in their 
>>>>>>>>>>> bug
>>>>>>>>>>> database for it either:
>>>>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array 
>>>>>>>>>>> so we
>>>>>>>>>>> could submit a bug about it as well. But, currently you need 
>>>>>>>>>>> to
>>>>>>>>>>> manually request an account in order to post bugs, so this 
>>>>>>>>>>> might take
>>>>>>>>>>> a while. I've also send an email just now to the cfe-dev 
>>>>>>>>>>> mailing list,
>>>>>>>>>>> maybe someone there can offer some help.
>>>>>>>>>>> 
>>>>>>>>>>> In the meantime we can work around structs with flexible 
>>>>>>>>>>> arrays.
>>>>>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>>>>>> extracted
>>>>>>>>>>> interface unusable, so emitting an undefined layout as a 
>>>>>>>>>>> default value
>>>>>>>>>>> and printing a warning seems like a better option to me until 
>>>>>>>>>>> the
>>>>>>>>>>> issue is fixed in libclang.
>>>>>>>>>>> 
>>>>>>>>>>> There's also the idea of introducing a jextract option to 
>>>>>>>>>>> manually
>>>>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like 
>>>>>>>>>>> that to
>>>>>>>>>>> manually provide the layout descriptor for structs with 
>>>>>>>>>>> flexible array
>>>>>>>>>>> members in the meantime.
>>>>>>>>>>> 
>>>>>>>>>>> Jorn
>>>>>>>>>>> 
>>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>>>>> 0-length array for now. >> For jextract, C only allow such 
>>>>>>>>>>>>>> field at end of struct, and sizeof() operator simply 
>>>>>>>>>>>>>> ignore that trailing array field. This should give us a 
>>>>>>>>>>>>>> good match as first step.
>>>>>>>>>>>>> Agree
>>>>>>>>>>>> 
>>>>>>>>>>>> Sure, that would be preferable. But, as discussed before 
>>>>>>>>>>>> [1], libclang
>>>>>>>>>>>> does not handle incomplete arrays properly, that's why it 
>>>>>>>>>>>> was changed
>>>>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>>>> 
>>>>>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>>>>>> option, so
>>>>>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>>>>>> otherwise
>>>>>>>>>>>> usable, you're out of luck.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jorn
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] :
>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>>>>> 0-length array for now. For jextract, C only allow such 
>>>>>>>>>>>>>> field at end of struct, and sizeof() operator simply 
>>>>>>>>>>>>>> ignore that trailing array field. This should give us a 
>>>>>>>>>>>>>> good match as first step.
>>>>>>>>>>>>> Agree
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Move on to more general support, where incomplete array 
>>>>>>>>>>>>>> can be in-between layouts. Before that, we probably need 
>>>>>>>>>>>>>> to validate some assumption,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any incomplete array must have length specified in the 
>>>>>>>>>>>>>> same struct before the incomplete array. I believe this 
>>>>>>>>>>>>>> will pretty much cover most cases if not all.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>>>>>>>> error if
>>>>>>>>>>>>> the incomplete array is the only member of the struct, or 
>>>>>>>>>>>>> if it's
>>>>>>>>>>>>> followed by other stuff:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>>>> 
>>>>>>>>>>>>> struct A {
>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>> };
>>>>>>>>>>>>> 
>>>>>>>>>>>>> struct B {
>>>>>>>>>>>>>    int l;
>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>> };
>>>>>>>>>>>>> 
>>>>>>>>>>>>> struct C {
>>>>>>>>>>>>>    int l;
>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>    int l2;
>>>>>>>>>>>>> };
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>>>> 
>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct 
>>>>>>>>>>>>> with no named members
>>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>>         ^~~
>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end of 
>>>>>>>>>>>>> struct
>>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>>         ^~~
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not 
>>>>>>>>>>>>> allowed in otherwise empty
>>>>>>>>>>>>>       struct
>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>        ^
>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with 
>>>>>>>>>>>>> type 'int []' is not at
>>>>>>>>>>>>>       the end of struct
>>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>>        ^
>>>>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>>>>    int l2;
>>>>>>>>>>>>>        ^
>>>>>>>>>>>>> 2 errors generated.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>>>>>> approach -
>>>>>>>>>>>>> e.g. have one or more values in a struct provide more info 
>>>>>>>>>>>>> for certain
>>>>>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>>>>>> business with
>>>>>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>>>>>> variable-sized data
>>>>>>>>>>>>> is expressed as length + data array (sometimes compressed, 
>>>>>>>>>>>>> as in
>>>>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree that's where we need to land, longer term. Short 
>>>>>>>>>>>>> term it feels
>>>>>>>>>>>>> like the best move would be to just strip the array. 
>>>>>>>>>>>>> Creating a
>>>>>>>>>>>>> 0-length array might be a  move with subtle consequences: 
>>>>>>>>>>>>> the array
>>>>>>>>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>>>>>>>>> region).
>>>>>>>>>>>>> The boundaries of the enclosing region are usually computed 
>>>>>>>>>>>>> using
>>>>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region 
>>>>>>>>>>>>> won't be
>>>>>>>>>>>>> 'big enough' to host anything but a zero-length array. If 
>>>>>>>>>>>>> we cast the
>>>>>>>>>>>>> array to something else, what we get back is an array whose 
>>>>>>>>>>>>> boundaries
>>>>>>>>>>>>> would exceed those of the enclosing struct - so if you try 
>>>>>>>>>>>>> e.g. to
>>>>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To do this stuff properly in Panama you would need to 
>>>>>>>>>>>>> allocate a
>>>>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much as 
>>>>>>>>>>>>> you would
>>>>>>>>>>>>> in C), and then cast the memory to the struct type - now 
>>>>>>>>>>>>> you have a
>>>>>>>>>>>>> memory region that is big enough to do the struct + the 
>>>>>>>>>>>>> array.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The alternative, which does look simpler, is to just 
>>>>>>>>>>>>> allocate a struct
>>>>>>>>>>>>> (with array stripped) followed by an array of desired size 
>>>>>>>>>>>>> in the same
>>>>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class)); 
>>>>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>>>> 
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>>>>> s.allocateArray(NativeTypes.INT32, 10);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>>>>>> allocation, use
>>>>>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>>>>>> difference
>>>>>>>>>>>>> between the two: in the first snippet, the array is 
>>>>>>>>>>>>> allocated
>>>>>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>>>>>> happened. In
>>>>>>>>>>>>> the second snippet there's no guarantee that the array will 
>>>>>>>>>>>>> be after
>>>>>>>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>>>>>>>> space with
>>>>>>>>>>>>> the struct and needed to allocate a new slab of native 
>>>>>>>>>>>>> memory via
>>>>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Which seems to suggest that the right way of approaching 
>>>>>>>>>>>>> the problem,
>>>>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maurizio