[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Feb 26 15:10:51 UTC 2019
On 26/02/2019 15:02, Jorn Vernee wrote:
> So, it seemed good to put them somewhere in the middle, e.g. Utils
>
sorry, I saw the code moving and did not notice it was still used. Good.
Maurizio
> Jorn
>
> Jorn Vernee schreef op 2019-02-26 16:00:
>> Well, right now they are used in 2 places; in RecordLayoutComputer,
>> because we still need to detect a flexible array and emit a layout
>> reference, and in the warning visitor.
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-02-26 15:56:
>>> One minor comment - don't the methods you moved from
>>> RecordLayoutComputer to Utils actually belong to the new visitor?
>>>
>>> Maurizio
>>>
>>> On 26/02/2019 14:53, Maurizio Cimadamore wrote:
>>>> Looks good - in the future, should we add more checks we can rename
>>>> the visitor to some more general name.
>>>>
>>>> Maurizio
>>>>
>>>> On 26/02/2019 14:33, Jorn Vernee wrote:
>>>>> Uhh... I just realized the MultiplexingVisitor I made is totally
>>>>> pointless... because Trees implement the visitor pattern.
>>>>> (probably time to take a break :-) )
>>>>>
>>>>> Updated webrev:
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.06/
>>>>>
>>>>> Jorn
>>>>>
>>>>> Jorn Vernee schreef op 2019-02-26 14:20:
>>>>>> After some private discussion, here is the updated webrev that that
>>>>>> implements the check as an actual visitor instead.
>>>>>>
>>>>>> Updated webrev:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.05/
>>>>>>
>>>>>>
>>>>>> Jorn
>>>>>>
>>>>>> Maurizio Cimadamore schreef op 2019-02-26 02:24:
>>>>>>> Thanks this looks much simpler.
>>>>>>>
>>>>>>> And also points at another possible simplification - why don't we
>>>>>>> generate such warning in the parser? After all, is the parser that
>>>>>>> knows when a StructTree will be created, so it is in an ideal
>>>>>>> situation to intercept that event and log a warning accordingly
>>>>>>> (and,
>>>>>>> the parser already has logging support). Btw, I know that the
>>>>>>> actual
>>>>>>> StructTree creation occurs in TreeMaker - we could either add
>>>>>>> context
>>>>>>> to TreeMaker, or just peek for a struct cursor in Parser.
>>>>>>>
>>>>>>> But overall, adding the check on earlier visitors might be
>>>>>>> better than
>>>>>>> using peek().
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>> On 26/02/2019 00:30, Jorn Vernee wrote:
>>>>>>>> Hi Maurizio,
>>>>>>>>
>>>>>>>> I think you're right. I replaced the exception with a warning
>>>>>>>> since the 2 are similar, but throwing an exception/warning at
>>>>>>>> that point is probably wrong in the first place, like you say.
>>>>>>>>
>>>>>>>> Based on your suggestion, I've implemented the warning
>>>>>>>> separately as a .peek() operation on the stream of Trees.
>>>>>>>>
>>>>>>>> Updated webrev:
>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>>>>>>>
>>>>>>>>
>>>>>>>> Jorn
>>>>>>>>
>>>>>>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>>>>>>> Hi Jorn,
>>>>>>>>> while in principle making context immutable is the right to do on
>>>>>>>>> paper, I believe that we probably ended up curing a symptom
>>>>>>>>> and not
>>>>>>>>> the disease :-)
>>>>>>>>>
>>>>>>>>> Which means: I find it very strange to see that a layout
>>>>>>>>> computer -
>>>>>>>>> that is, a function which takes a clang Cursor and emits a
>>>>>>>>> Layout is
>>>>>>>>> now suddenly starting to give side-effects. Given I've worked
>>>>>>>>> on javac
>>>>>>>>> for many years, I can assure you that this will almost surely
>>>>>>>>> lead to
>>>>>>>>> problems down the road. For instance, if this apparently harmless
>>>>>>>>> stateless function is called twice on the same cursor, we
>>>>>>>>> would get
>>>>>>>>> two warnings; another issue is that the code will fail to
>>>>>>>>> scale if new
>>>>>>>>> warnings will need to be added - e.g. if record layout
>>>>>>>>> computer needs
>>>>>>>>> to check for other problematic stuff. And I'm not even mentioning
>>>>>>>>> adding logic to enable/disable specific classes of warnings,
>>>>>>>>> which is
>>>>>>>>> another thing that might happen, at some point.
>>>>>>>>>
>>>>>>>>> What works in these contexts, is to separate the logic that
>>>>>>>>> does the
>>>>>>>>> stateless computation (e.g. layout computation, in this case),
>>>>>>>>> from
>>>>>>>>> the logic that issues the warning. For instance, we could have a
>>>>>>>>> separate visitor in jextract which looks at the clang cursor,
>>>>>>>>> looking
>>>>>>>>> for things that are suspicious or not well supported (such as
>>>>>>>>> incomplete arrays) and report a warning _only once_.
>>>>>>>>>
>>>>>>>>> This visitor will probably write itself, and, longer term
>>>>>>>>> would be a
>>>>>>>>> much more maintainable option than inlining the check in the
>>>>>>>>> layout
>>>>>>>>> computer function itself.
>>>>>>>>>
>>>>>>>>> After which, the decision as to whether make the context
>>>>>>>>> mutable or
>>>>>>>>> not is largely orthogonal - we could go there, although it's
>>>>>>>>> not clear
>>>>>>>>> if the extra visitor I'm proposing would significantly switch the
>>>>>>>>> balance one way or another (probably not, as we have plenty other
>>>>>>>>> visitors which do logging).
>>>>>>>>>
>>>>>>>>> Maurizio
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>>>>>>> I've split this into 2 to try and make reviewing easier.
>>>>>>>>>>
>>>>>>>>>> Here's the part that makes Context immutable (this required
>>>>>>>>>> quite a bit of refactoring in Main):
>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/
>>>>>>>>>> Here's the original part that returns a layout reference when
>>>>>>>>>> encountering a type with a flexible array (applies on top of
>>>>>>>>>> the first one):
>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/
>>>>>>>>>> Hope that helps,
>>>>>>>>>> Jorn
>>>>>>>>>>
>>>>>>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>>>>>>> I talked a bit with Sundar;
>>>>>>>>>>>
>>>>>>>>>>> We can not create a global static method to return the
>>>>>>>>>>> Context since
>>>>>>>>>>> there could be more than 1 Context, due to multiple
>>>>>>>>>>> instances of
>>>>>>>>>>> Jextract being created through the ToolProvider interface.
>>>>>>>>>>> But, as
>>>>>>>>>>> long as Context is Immutable, passing it down to other code
>>>>>>>>>>> should be
>>>>>>>>>>> OK.
>>>>>>>>>>>
>>>>>>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>>>>>>> creating the Context in a separate method in Main
>>>>>>>>>>> (createContext).
>>>>>>>>>>> Some places in that code were prematurely exiting and
>>>>>>>>>>> returning an
>>>>>>>>>>> error code. This is replaced by throwing a custom exception
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> Update webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorn
>>>>>>>>>>>
>>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>>>>>>> In other words; we can't get the layout for a type with a
>>>>>>>>>>>> flexible
>>>>>>>>>>>> array from libclang at all. A good way to resolve that is
>>>>>>>>>>>> probably to
>>>>>>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>>>>>>> searching, and there doesn't seem to be a bug report in
>>>>>>>>>>>> their bug
>>>>>>>>>>>> database for it either:
>>>>>>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array
>>>>>>>>>>>> so we
>>>>>>>>>>>> could submit a bug about it as well. But, currently you
>>>>>>>>>>>> need to
>>>>>>>>>>>> manually request an account in order to post bugs, so this
>>>>>>>>>>>> might take
>>>>>>>>>>>> a while. I've also send an email just now to the cfe-dev
>>>>>>>>>>>> mailing list,
>>>>>>>>>>>> maybe someone there can offer some help.
>>>>>>>>>>>>
>>>>>>>>>>>> In the meantime we can work around structs with flexible
>>>>>>>>>>>> arrays.
>>>>>>>>>>>> Currently an exception is thrown, but this makes the whole
>>>>>>>>>>>> extracted
>>>>>>>>>>>> interface unusable, so emitting an undefined layout as a
>>>>>>>>>>>> default value
>>>>>>>>>>>> and printing a warning seems like a better option to me
>>>>>>>>>>>> until the
>>>>>>>>>>>> issue is fixed in libclang.
>>>>>>>>>>>>
>>>>>>>>>>>> There's also the idea of introducing a jextract option to
>>>>>>>>>>>> manually
>>>>>>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something
>>>>>>>>>>>> like that to
>>>>>>>>>>>> manually provide the layout descriptor for structs with
>>>>>>>>>>>> flexible array
>>>>>>>>>>>> members in the meantime.
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>>>>>>> Now to the real discussion about incomplete array
>>>>>>>>>>>>>>> support, instead of undefined layout, I prefer to have
>>>>>>>>>>>>>>> limited support, we can either strip that field or
>>>>>>>>>>>>>>> generate a 0-length array for now. >> For jextract, C
>>>>>>>>>>>>>>> only allow such field at end of struct, and sizeof()
>>>>>>>>>>>>>>> operator simply ignore that trailing array field. This
>>>>>>>>>>>>>>> should give us a good match as first step.
>>>>>>>>>>>>>> Agree
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sure, that would be preferable. But, as discussed before
>>>>>>>>>>>>> [1], libclang
>>>>>>>>>>>>> does not handle incomplete arrays properly, that's why it
>>>>>>>>>>>>> was changed
>>>>>>>>>>>>> to emit an exception in the first place.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is not possible to filter out structs with any jextract
>>>>>>>>>>>>> option, so
>>>>>>>>>>>>> if you have an incomplete array in a header file that is
>>>>>>>>>>>>> otherwise
>>>>>>>>>>>>> usable, you're out of luck.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorn
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] :
>>>>>>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html
>>>>>>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>>>>>>> Now to the real discussion about incomplete array
>>>>>>>>>>>>>>> support, instead of undefined layout, I prefer to have
>>>>>>>>>>>>>>> limited support, we can either strip that field or
>>>>>>>>>>>>>>> generate a 0-length array for now. For jextract, C only
>>>>>>>>>>>>>>> allow such field at end of struct, and sizeof() operator
>>>>>>>>>>>>>>> simply ignore that trailing array field. This should
>>>>>>>>>>>>>>> give us a good match as first step.
>>>>>>>>>>>>>> Agree
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Move on to more general support, where incomplete array
>>>>>>>>>>>>>>> can be in-between layouts. Before that, we probably need
>>>>>>>>>>>>>>> to validate some assumption,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any incomplete array must have length specified in the
>>>>>>>>>>>>>>> same struct before the incomplete array. I believe this
>>>>>>>>>>>>>>> will pretty much cover most cases if not all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To reinforce this point, I believe most compilers even
>>>>>>>>>>>>>> give error if
>>>>>>>>>>>>>> the incomplete array is the only member of the struct, or
>>>>>>>>>>>>>> if it's
>>>>>>>>>>>>>> followed by other stuff:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat testInc.h
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct A {
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct B {
>>>>>>>>>>>>>> int l;
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct C {
>>>>>>>>>>>>>> int l;
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> int l2;
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct
>>>>>>>>>>>>>> with no named members
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> ^~~
>>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member not at end
>>>>>>>>>>>>>> of struct
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> ^~~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ clang -c testInc.h
>>>>>>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not
>>>>>>>>>>>>>> allowed in otherwise empty
>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> ^
>>>>>>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with
>>>>>>>>>>>>>> type 'int []' is not at
>>>>>>>>>>>>>> the end of struct
>>>>>>>>>>>>>> int arr[];
>>>>>>>>>>>>>> ^
>>>>>>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>>>>>> int l2;
>>>>>>>>>>>>>> ^
>>>>>>>>>>>>>> 2 errors generated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What you describe is what I've dubbed 'dependent layout'
>>>>>>>>>>>>>> approach -
>>>>>>>>>>>>>> e.g. have one or more values in a struct provide more
>>>>>>>>>>>>>> info for certain
>>>>>>>>>>>>>> layout elements in same struct. This is fairly frequent
>>>>>>>>>>>>>> business with
>>>>>>>>>>>>>> message protocols - almost all representation for
>>>>>>>>>>>>>> variable-sized data
>>>>>>>>>>>>>> is expressed as length + data array (sometimes
>>>>>>>>>>>>>> compressed, as in
>>>>>>>>>>>>>> protobuf's VarInt).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree that's where we need to land, longer term. Short
>>>>>>>>>>>>>> term it feels
>>>>>>>>>>>>>> like the best move would be to just strip the array.
>>>>>>>>>>>>>> Creating a
>>>>>>>>>>>>>> 0-length array might be a move with subtle consequences:
>>>>>>>>>>>>>> the array
>>>>>>>>>>>>>> occurs within a region with some boundaries (e.g. a
>>>>>>>>>>>>>> struct region).
>>>>>>>>>>>>>> The boundaries of the enclosing region are usually
>>>>>>>>>>>>>> computed using
>>>>>>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region
>>>>>>>>>>>>>> won't be
>>>>>>>>>>>>>> 'big enough' to host anything but a zero-length array. If
>>>>>>>>>>>>>> we cast the
>>>>>>>>>>>>>> array to something else, what we get back is an array
>>>>>>>>>>>>>> whose boundaries
>>>>>>>>>>>>>> would exceed those of the enclosing struct - so if you
>>>>>>>>>>>>>> try e.g. to
>>>>>>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To do this stuff properly in Panama you would need to
>>>>>>>>>>>>>> allocate a
>>>>>>>>>>>>>> bigger chunk of memory, of the desired size (pretty much
>>>>>>>>>>>>>> as you would
>>>>>>>>>>>>>> in C), and then cast the memory to the struct type - now
>>>>>>>>>>>>>> you have a
>>>>>>>>>>>>>> memory region that is big enough to do the struct + the
>>>>>>>>>>>>>> array.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The alternative, which does look simpler, is to just
>>>>>>>>>>>>>> allocate a struct
>>>>>>>>>>>>>> (with array stripped) followed by an array of desired
>>>>>>>>>>>>>> size in the same
>>>>>>>>>>>>>> scope - e.g. compare this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>>>>>> Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>>>>>>>>> Array<Integer> data =
>>>>>>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> StructWithoutIncompleteArray struct =
>>>>>>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>>>>>> Array<Integer> data =
>>>>>>>>>>>>>> s.allocateArray(NativeTypes.INT32, 10);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The code looks similar - and a client can, after the
>>>>>>>>>>>>>> allocation, use
>>>>>>>>>>>>>> the array and the struct at will. But there's a subtle
>>>>>>>>>>>>>> difference
>>>>>>>>>>>>>> between the two: in the first snippet, the array is
>>>>>>>>>>>>>> allocated
>>>>>>>>>>>>>> immediately after the struct bits - that's how allocation
>>>>>>>>>>>>>> happened. In
>>>>>>>>>>>>>> the second snippet there's no guarantee that the array
>>>>>>>>>>>>>> will be after
>>>>>>>>>>>>>> the struct; in fact, the native scope might have run out
>>>>>>>>>>>>>> of space with
>>>>>>>>>>>>>> the struct and needed to allocate a new slab of native
>>>>>>>>>>>>>> memory via
>>>>>>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which seems to suggest that the right way of approaching
>>>>>>>>>>>>>> the problem,
>>>>>>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maurizio
More information about the panama-dev
mailing list