[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Jorn Vernee jbvernee at xs4all.nl
Tue Feb 26 15:00:29 UTC 2019


Well, right now they are used in 2 places; in RecordLayoutComputer, 
because we still need to detect a flexible array and emit a layout 
reference, and in the warning visitor.

Jorn

Maurizio Cimadamore schreef op 2019-02-26 15:56:
> One minor comment - don't the methods you moved from
> RecordLayoutComputer to Utils actually belong to the new visitor?
> 
> Maurizio
> 
> On 26/02/2019 14:53, Maurizio Cimadamore wrote:
>> Looks good - in the future, should we add more checks we can rename 
>> the visitor to some more general name.
>> 
>> Maurizio
>> 
>> On 26/02/2019 14:33, Jorn Vernee wrote:
>>> Uhh... I just realized the MultiplexingVisitor I made is totally 
>>> pointless... because Trees implement the visitor pattern. (probably 
>>> time to take a break :-) )
>>> 
>>> Updated webrev: 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.06/
>>> 
>>> Jorn
>>> 
>>> Jorn Vernee schreef op 2019-02-26 14:20:
>>>> After some private discussion, here is the updated webrev that that
>>>> implements the check as an actual visitor instead.
>>>> 
>>>> Updated webrev:
>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.05/
>>>> 
>>>> Jorn
>>>> 
>>>> Maurizio Cimadamore schreef op 2019-02-26 02:24:
>>>>> Thanks this looks much simpler.
>>>>> 
>>>>> And also points at another possible simplification - why don't we
>>>>> generate such warning in the parser? After all, is the parser that
>>>>> knows when a StructTree will be created, so it is in an ideal
>>>>> situation to intercept that event and log a warning accordingly 
>>>>> (and,
>>>>> the parser already has logging support). Btw, I know that the 
>>>>> actual
>>>>> StructTree creation occurs in TreeMaker - we could either add 
>>>>> context
>>>>> to TreeMaker, or just peek for a struct cursor in Parser.
>>>>> 
>>>>> But overall, adding the check on earlier visitors might be better 
>>>>> than
>>>>> using peek().
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>> On 26/02/2019 00:30, Jorn Vernee wrote:
>>>>>> Hi Maurizio,
>>>>>> 
>>>>>> I think you're right. I replaced the exception with a warning 
>>>>>> since the 2 are similar, but throwing an exception/warning at that 
>>>>>> point is probably wrong in the first place, like you say.
>>>>>> 
>>>>>> Based on your suggestion, I've implemented the warning separately 
>>>>>> as a .peek() operation on the stream of Trees.
>>>>>> 
>>>>>> Updated webrev: 
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>>>>> 
>>>>>> Jorn
>>>>>> 
>>>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>>>> Hi Jorn,
>>>>>>> while in principle making context immutable is the right to do on
>>>>>>> paper, I believe that we probably ended up curing a symptom and 
>>>>>>> not
>>>>>>> the disease :-)
>>>>>>> 
>>>>>>> Which means: I find it very strange to see that a layout computer 
>>>>>>> -
>>>>>>> that is, a function which takes a clang Cursor and emits a Layout 
>>>>>>> is
>>>>>>> now suddenly starting to give side-effects. Given I've worked on 
>>>>>>> javac
>>>>>>> for many years, I can assure you that this will almost surely 
>>>>>>> lead to
>>>>>>> problems down the road. For instance, if this apparently harmless
>>>>>>> stateless function is called twice on the same cursor, we would 
>>>>>>> get
>>>>>>> two warnings; another issue is that the code will fail to scale 
>>>>>>> if new
>>>>>>> warnings will need to be added - e.g. if record layout computer 
>>>>>>> needs
>>>>>>> to check for other problematic stuff. And I'm not even mentioning
>>>>>>> adding logic to enable/disable specific classes of warnings, 
>>>>>>> which is
>>>>>>> another thing that might happen, at some point.
>>>>>>> 
>>>>>>> What works in these contexts, is to separate the logic that does 
>>>>>>> the
>>>>>>> stateless computation (e.g. layout computation, in this case), 
>>>>>>> from
>>>>>>> the logic that issues the warning. For instance, we could have a
>>>>>>> separate visitor in jextract which looks at the clang cursor, 
>>>>>>> looking
>>>>>>> for things that are suspicious or not well supported (such as
>>>>>>> incomplete arrays) and report a warning _only once_.
>>>>>>> 
>>>>>>> This visitor will probably write itself, and, longer term would 
>>>>>>> be a
>>>>>>> much more maintainable option than inlining the check in the 
>>>>>>> layout
>>>>>>> computer function itself.
>>>>>>> 
>>>>>>> After which, the decision as to whether make the context mutable 
>>>>>>> or
>>>>>>> not is largely orthogonal - we could go there, although it's not 
>>>>>>> clear
>>>>>>> if the extra visitor I'm proposing would significantly switch the
>>>>>>> balance one way or another (probably not, as we have plenty other
>>>>>>> visitors which do logging).
>>>>>>> 
>>>>>>> Maurizio
>>>>>>> 
>>>>>>> 
>>>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>>> 
>>>>>>>> Here's the part that makes Context immutable (this required 
>>>>>>>> quite a bit of refactoring in Main):
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/ 
>>>>>>>> Here's the original part that returns a layout reference when 
>>>>>>>> encountering a type with a flexible array (applies on top of the 
>>>>>>>> first one):
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/ 
>>>>>>>> Hope that helps,
>>>>>>>> Jorn
>>>>>>>> 
>>>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>>>> I talked a bit with Sundar;
>>>>>>>>> 
>>>>>>>>> We can not create a global static method to return the Context 
>>>>>>>>> since
>>>>>>>>> there could be more than 1 Context, due to multiple instances 
>>>>>>>>> of
>>>>>>>>> Jextract being created through the ToolProvider interface. But, 
>>>>>>>>> as
>>>>>>>>> long as Context is Immutable, passing it down to other code 
>>>>>>>>> should be
>>>>>>>>> OK.
>>>>>>>>> 
>>>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>>>> creating the Context in a separate method in Main 
>>>>>>>>> (createContext).
>>>>>>>>> Some places in that code were prematurely exiting and returning 
>>>>>>>>> an
>>>>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>>>>> instead.
>>>>>>>>> 
>>>>>>>>> Update webrev:
>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/ 
>>>>>>>>> Thanks,
>>>>>>>>> Jorn
>>>>>>>>> 
>>>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>>>> In other words; we can't get the layout for a type with a 
>>>>>>>>>> flexible
>>>>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>>>>> probably to
>>>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>>>> searching, and there doesn't seem to be a bug report in their 
>>>>>>>>>> bug
>>>>>>>>>> database for it either:
>>>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array 
>>>>>>>>>> so we
>>>>>>>>>> could submit a bug about it as well. But, currently you need 
>>>>>>>>>> to
>>>>>>>>>> manually request an account in order to post bugs, so this 
>>>>>>>>>> might take
>>>>>>>>>> a while. I've also send an email just now to the cfe-dev 
>>>>>>>>>> mailing list,
>>>>>>>>>> maybe someone there can offer some help.
>>>>>>>>>> 
>>>>>>>>>> In the meantime we can work around structs with flexible 
>>>>>>>>>> arrays.
>>>>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>>>>> extracted
>>>>>>>>>> interface unusable, so emitting an undefined layout as a 
>>>>>>>>>> default value
>>>>>>>>>> and printing a warning seems like a better option to me until 
>>>>>>>>>> the
>>>>>>>>>> issue is fixed in libclang.
>>>>>>>>>> 
>>>>>>>>>> There's also the idea of introducing a jextract option to 
>>>>>>>>>> manually
>>>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like 
>>>>>>>>>> that to
>>>>>>>>>> manually provide the layout descriptor for structs with 
>>>>>>>>>> flexible array
>>>>>>>>>> members in the meantime.
>>>>>>>>>> 
>>>>>>>>>> Jorn
>>>>>>>>>> 
>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>>>> 0-length array for now. >> For jextract, C only allow such 
>>>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>>>> as first step.
>>>>>>>>>>>> Agree
>>>>>>>>>>> 
>>>>>>>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>>>>>>>> libclang
>>>>>>>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>>>>>>>> changed
>>>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>>> 
>>>>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>>>>> option, so
>>>>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>>>>> otherwise
>>>>>>>>>>> usable, you're out of luck.
>>>>>>>>>>> 
>>>>>>>>>>> Jorn
>>>>>>>>>>> 
>>>>>>>>>>> [1] :
>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>>>>> instead of undefined layout, I prefer to have limited 
>>>>>>>>>>>>> support, we can either strip that field or generate a 
>>>>>>>>>>>>> 0-length array for now. For jextract, C only allow such 
>>>>>>>>>>>>> field at end of struct, and sizeof() operator simply ignore 
>>>>>>>>>>>>> that trailing array field. This should give us a good match 
>>>>>>>>>>>>> as first step.
>>>>>>>>>>>> Agree
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Move on to more general support, where incomplete array can 
>>>>>>>>>>>>> be in-between layouts. Before that, we probably need to 
>>>>>>>>>>>>> validate some assumption,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>>>>>>>> struct before the incomplete array. I believe this will 
>>>>>>>>>>>>> pretty much cover most cases if not all.
>>>>>>>>>>>> 
>>>>>>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>>>>>>> error if
>>>>>>>>>>>> the incomplete array is the only member of the struct, or if 
>>>>>>>>>>>> it's
>>>>>>>>>>>> followed by other stuff:
>>>>>>>>>>>> 
>>>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>>> 
>>>>>>>>>>>> struct A {
>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> struct B {
>>>>>>>>>>>>    int l;
>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> struct C {
>>>>>>>>>>>>    int l;
>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>    int l2;
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>>> 
>>>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct with 
>>>>>>>>>>>> no named members
>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>         ^~~
>>>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end of 
>>>>>>>>>>>> struct
>>>>>>>>>>>>     int arr[];
>>>>>>>>>>>>         ^~~
>>>>>>>>>>>> 
>>>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not 
>>>>>>>>>>>> allowed in otherwise empty
>>>>>>>>>>>>       struct
>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>        ^
>>>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 
>>>>>>>>>>>> 'int []' is not at
>>>>>>>>>>>>       the end of struct
>>>>>>>>>>>>    int arr[];
>>>>>>>>>>>>        ^
>>>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>>>    int l2;
>>>>>>>>>>>>        ^
>>>>>>>>>>>> 2 errors generated.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>>> 
>>>>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>>>>> approach -
>>>>>>>>>>>> e.g. have one or more values in a struct provide more info 
>>>>>>>>>>>> for certain
>>>>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>>>>> business with
>>>>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>>>>> variable-sized data
>>>>>>>>>>>> is expressed as length + data array (sometimes compressed, 
>>>>>>>>>>>> as in
>>>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>>> 
>>>>>>>>>>>> I agree that's where we need to land, longer term. Short 
>>>>>>>>>>>> term it feels
>>>>>>>>>>>> like the best move would be to just strip the array. 
>>>>>>>>>>>> Creating a
>>>>>>>>>>>> 0-length array might be a  move with subtle consequences: 
>>>>>>>>>>>> the array
>>>>>>>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>>>>>>>> region).
>>>>>>>>>>>> The boundaries of the enclosing region are usually computed 
>>>>>>>>>>>> using
>>>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region 
>>>>>>>>>>>> won't be
>>>>>>>>>>>> 'big enough' to host anything but a zero-length array. If we 
>>>>>>>>>>>> cast the
>>>>>>>>>>>> array to something else, what we get back is an array whose 
>>>>>>>>>>>> boundaries
>>>>>>>>>>>> would exceed those of the enclosing struct - so if you try 
>>>>>>>>>>>> e.g. to
>>>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>>> 
>>>>>>>>>>>> To do this stuff properly in Panama you would need to 
>>>>>>>>>>>> allocate a
>>>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much as 
>>>>>>>>>>>> you would
>>>>>>>>>>>> in C), and then cast the memory to the struct type - now you 
>>>>>>>>>>>> have a
>>>>>>>>>>>> memory region that is big enough to do the struct + the 
>>>>>>>>>>>> array.
>>>>>>>>>>>> 
>>>>>>>>>>>> The alternative, which does look simpler, is to just 
>>>>>>>>>>>> allocate a struct
>>>>>>>>>>>> (with array stripped) followed by an array of desired size 
>>>>>>>>>>>> in the same
>>>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>>> 
>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>> 
>>>>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class)); 
>>>>>>>>>>>>     Array<Integer> data = 
>>>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>>> 
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> With this:
>>>>>>>>>>>> 
>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>> 
>>>>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 
>>>>>>>>>>>> 10);
>>>>>>>>>>>> 
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>>>>> allocation, use
>>>>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>>>>> difference
>>>>>>>>>>>> between the two: in the first snippet, the array is 
>>>>>>>>>>>> allocated
>>>>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>>>>> happened. In
>>>>>>>>>>>> the second snippet there's no guarantee that the array will 
>>>>>>>>>>>> be after
>>>>>>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>>>>>>> space with
>>>>>>>>>>>> the struct and needed to allocate a new slab of native 
>>>>>>>>>>>> memory via
>>>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>>>>>>>> problem,
>>>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>>> 
>>>>>>>>>>>> Maurizio


More information about the panama-dev mailing list