[foreign] RFR 8218052: Don't throw an exception when encountering a type with a flexible array member

Jorn Vernee jbvernee at xs4all.nl
Tue Feb 26 13:39:34 UTC 2019


That sounds like an interesting option. So for ToolProvider/command line 
we'd have:

     String[] args -> Context.Builder -> Context -> run jextract

And for the API we'd have:

     Context.Builder -> Context -> run jextract

Basically cutting out the String[] step.

In the last patch I only used the Context to do the actual jextract run. 
After that, there are several strategies for outputting the result. A 
user could also choose to use the result directly. So as a straw man:

```
     interface JextractAPI {
         static JextractAPI newInstance();

         Context.Builder contextBuilder();
         JextractResult runJextract(Context context);

         void writeClassFiles(Path dir, JextractResult);
         void writeJar(Path jar, JextractResult);
         void writeJmod(Path jmod, JextractResult)
     }
```

Something like that?

Either way, making Context immutable and adding a Context.Builder seems 
like a good start, so I'll make a separate RFR for that patch.

Jorn

Maurizio Cimadamore schreef op 2019-02-26 12:00:
> On 26/02/2019 04:06, Sundararajan Athijegannathan wrote:
>> Actually, I like Context.Builder and immutable Context - regardless of 
>> whether we pass it around widely or not. Currently it is partially 
>> immutable with few fields being final & rest non-final + mutable 
>> collection. Perhaps that part alone as a separate patch?
> 
> I'm open to that - I just wanted to point out what, to me, seemed like
> the bigger issue.
> 
> I we go down the builder path, I wonder if, what we call
> ContextBuilder isn't part of a bigger JextractToolBuilder API -
> similar to what has been done in jshell:
> 
> https://docs.oracle.com/javase/10/docs/api/jdk/jshell/tool/JavaShellToolBuilder.html
> 
> Maurizio
> 
>> 
>> -Sundar
>> 
>> On 26/02/19, 6:00 AM, Jorn Vernee wrote:
>>> Hi Maurizio,
>>> 
>>> I think you're right. I replaced the exception with a warning since 
>>> the 2 are similar, but throwing an exception/warning at that point is 
>>> probably wrong in the first place, like you say.
>>> 
>>> Based on your suggestion, I've implemented the warning separately as 
>>> a .peek() operation on the stream of Trees.
>>> 
>>> Updated webrev: 
>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.04/
>>> 
>>> Jorn
>>> 
>>> Maurizio Cimadamore schreef op 2019-02-26 00:34:
>>>> Hi Jorn,
>>>> while in principle making context immutable is the right to do on
>>>> paper, I believe that we probably ended up curing a symptom and not
>>>> the disease :-)
>>>> 
>>>> Which means: I find it very strange to see that a layout computer -
>>>> that is, a function which takes a clang Cursor and emits a Layout is
>>>> now suddenly starting to give side-effects. Given I've worked on 
>>>> javac
>>>> for many years, I can assure you that this will almost surely lead 
>>>> to
>>>> problems down the road. For instance, if this apparently harmless
>>>> stateless function is called twice on the same cursor, we would get
>>>> two warnings; another issue is that the code will fail to scale if 
>>>> new
>>>> warnings will need to be added - e.g. if record layout computer 
>>>> needs
>>>> to check for other problematic stuff. And I'm not even mentioning
>>>> adding logic to enable/disable specific classes of warnings, which 
>>>> is
>>>> another thing that might happen, at some point.
>>>> 
>>>> What works in these contexts, is to separate the logic that does the
>>>> stateless computation (e.g. layout computation, in this case), from
>>>> the logic that issues the warning. For instance, we could have a
>>>> separate visitor in jextract which looks at the clang cursor, 
>>>> looking
>>>> for things that are suspicious or not well supported (such as
>>>> incomplete arrays) and report a warning _only once_.
>>>> 
>>>> This visitor will probably write itself, and, longer term would be a
>>>> much more maintainable option than inlining the check in the layout
>>>> computer function itself.
>>>> 
>>>> After which, the decision as to whether make the context mutable or
>>>> not is largely orthogonal - we could go there, although it's not 
>>>> clear
>>>> if the extra visitor I'm proposing would significantly switch the
>>>> balance one way or another (probably not, as we have plenty other
>>>> visitors which do logging).
>>>> 
>>>> Maurizio
>>>> 
>>>> 
>>>> On 25/02/2019 23:16, Jorn Vernee wrote:
>>>>> I've split this into 2 to try and make reviewing easier.
>>>>> 
>>>>> Here's the part that makes Context immutable (this required quite a 
>>>>> bit of refactoring in Main):
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/context/webrev.00/
>>>>> 
>>>>> Here's the original part that returns a layout reference when 
>>>>> encountering a type with a flexible array (applies on top of the 
>>>>> first one):
>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.03/
>>>>> 
>>>>> Hope that helps,
>>>>> Jorn
>>>>> 
>>>>> Jorn Vernee schreef op 2019-02-25 23:06:
>>>>>> I talked a bit with Sundar;
>>>>>> 
>>>>>> We can not create a global static method to return the Context 
>>>>>> since
>>>>>> there could be more than 1 Context, due to multiple instances of
>>>>>> Jextract being created through the ToolProvider interface. But, as
>>>>>> long as Context is Immutable, passing it down to other code should 
>>>>>> be
>>>>>> OK.
>>>>>> 
>>>>>> I have made Context immutable by adding a Context.Builder, and
>>>>>> creating the Context in a separate method in Main (createContext).
>>>>>> Some places in that code were prematurely exiting and returning an
>>>>>> error code. This is replaced by throwing a custom exception 
>>>>>> instead.
>>>>>> 
>>>>>> Update webrev:
>>>>>> http://cr.openjdk.java.net/~jvernee/panama/webrevs/8218052/webrev.02/
>>>>>> 
>>>>>> Thanks,
>>>>>> Jorn
>>>>>> 
>>>>>> Jorn Vernee schreef op 2019-02-16 15:30:
>>>>>>> In other words; we can't get the layout for a type with a 
>>>>>>> flexible
>>>>>>> array from libclang at all. A good way to resolve that is 
>>>>>>> probably to
>>>>>>> submit a patch for it to libclang. I've done a little bit of
>>>>>>> searching, and there doesn't seem to be a bug report in their bug
>>>>>>> database for it either:
>>>>>>> https://bugs.llvm.org/buglist.cgi?quicksearch=flexible%20array so 
>>>>>>> we
>>>>>>> could submit a bug about it as well. But, currently you need to
>>>>>>> manually request an account in order to post bugs, so this might 
>>>>>>> take
>>>>>>> a while. I've also send an email just now to the cfe-dev mailing 
>>>>>>> list,
>>>>>>> maybe someone there can offer some help.
>>>>>>> 
>>>>>>> In the meantime we can work around structs with flexible arrays.
>>>>>>> Currently an exception is thrown, but this makes the whole 
>>>>>>> extracted
>>>>>>> interface unusable, so emitting an undefined layout as a default 
>>>>>>> value
>>>>>>> and printing a warning seems like a better option to me until the
>>>>>>> issue is fixed in libclang.
>>>>>>> 
>>>>>>> There's also the idea of introducing a jextract option to 
>>>>>>> manually
>>>>>>> override a descriptor, e.g. `--patch-descriptor
>>>>>>> struct:MyStruct=[i32(x)i32(y)]`. We could use something like that 
>>>>>>> to
>>>>>>> manually provide the layout descriptor for structs with flexible 
>>>>>>> array
>>>>>>> members in the meantime.
>>>>>>> 
>>>>>>> Jorn
>>>>>>> 
>>>>>>> Jorn Vernee schreef op 2019-02-16 09:07:
>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>> instead of undefined layout, I prefer to have limited support, 
>>>>>>>>>> we can either strip that field or generate a 0-length array 
>>>>>>>>>> for now. >> For jextract, C only allow such field at end of 
>>>>>>>>>> struct, and sizeof() operator simply ignore that trailing 
>>>>>>>>>> array field. This should give us a good match as first step.
>>>>>>>>> Agree
>>>>>>>> 
>>>>>>>> Sure, that would be preferable. But, as discussed before [1], 
>>>>>>>> libclang
>>>>>>>> does not handle incomplete arrays properly, that's why it was 
>>>>>>>> changed
>>>>>>>> to emit an exception in the first place.
>>>>>>>> 
>>>>>>>> It is not possible to filter out structs with any jextract 
>>>>>>>> option, so
>>>>>>>> if you have an incomplete array in a header file that is 
>>>>>>>> otherwise
>>>>>>>> usable, you're out of luck.
>>>>>>>> 
>>>>>>>> Jorn
>>>>>>>> 
>>>>>>>> [1] :
>>>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-January/003975.html 
>>>>>>>> Maurizio Cimadamore schreef op 2019-02-16 01:29:
>>>>>>>>> On 16/02/2019 00:00, Henry Jen wrote:
>>>>>>>>>> Now to the real discussion about incomplete array support, 
>>>>>>>>>> instead of undefined layout, I prefer to have limited support, 
>>>>>>>>>> we can either strip that field or generate a 0-length array 
>>>>>>>>>> for now. For jextract, C only allow such field at end of 
>>>>>>>>>> struct, and sizeof() operator simply ignore that trailing 
>>>>>>>>>> array field. This should give us a good match as first step.
>>>>>>>>> Agree
>>>>>>>>>> 
>>>>>>>>>> Move on to more general support, where incomplete array can be 
>>>>>>>>>> in-between layouts. Before that, we probably need to validate 
>>>>>>>>>> some assumption,
>>>>>>>>>> 
>>>>>>>>>> Any incomplete array must have length specified in the same 
>>>>>>>>>> struct before the incomplete array. I believe this will pretty 
>>>>>>>>>> much cover most cases if not all.
>>>>>>>>> 
>>>>>>>>> To reinforce this point, I believe most compilers even give 
>>>>>>>>> error if
>>>>>>>>> the incomplete array is the only member of the struct, or if 
>>>>>>>>> it's
>>>>>>>>> followed by other stuff:
>>>>>>>>> 
>>>>>>>>> $ cat testInc.h
>>>>>>>>> 
>>>>>>>>> struct A {
>>>>>>>>>    int arr[];
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> struct B {
>>>>>>>>>    int l;
>>>>>>>>>    int arr[];
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> struct C {
>>>>>>>>>    int l;
>>>>>>>>>    int arr[];
>>>>>>>>>    int l2;
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> $ gcc -c testInc.h
>>>>>>>>> 
>>>>>>>>> testInc.h:2:8: error: flexible array member in a struct with no 
>>>>>>>>> named members
>>>>>>>>>     int arr[];
>>>>>>>>>         ^~~
>>>>>>>>> testInc.h:12:8: error: flexible array member not at end of 
>>>>>>>>> struct
>>>>>>>>>     int arr[];
>>>>>>>>>         ^~~
>>>>>>>>> 
>>>>>>>>> $ clang -c testInc.h
>>>>>>>>> testInc.h:2:8: error: flexible array member 'arr' not allowed 
>>>>>>>>> in otherwise empty
>>>>>>>>>       struct
>>>>>>>>>    int arr[];
>>>>>>>>>        ^
>>>>>>>>> testInc.h:12:8: error: flexible array member 'arr' with type 
>>>>>>>>> 'int []' is not at
>>>>>>>>>       the end of struct
>>>>>>>>>    int arr[];
>>>>>>>>>        ^
>>>>>>>>> testInc.h:13:8: note: next field declaration is here
>>>>>>>>>    int l2;
>>>>>>>>>        ^
>>>>>>>>> 2 errors generated.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> With that, I think following should work well enough,
>>>>>>>>> 
>>>>>>>>> What you describe is what I've dubbed 'dependent layout' 
>>>>>>>>> approach -
>>>>>>>>> e.g. have one or more values in a struct provide more info for 
>>>>>>>>> certain
>>>>>>>>> layout elements in same struct. This is fairly frequent 
>>>>>>>>> business with
>>>>>>>>> message protocols - almost all representation for 
>>>>>>>>> variable-sized data
>>>>>>>>> is expressed as length + data array (sometimes compressed, as 
>>>>>>>>> in
>>>>>>>>> protobuf's VarInt).
>>>>>>>>> 
>>>>>>>>> I agree that's where we need to land, longer term. Short term 
>>>>>>>>> it feels
>>>>>>>>> like the best move would be to just strip the array. Creating a
>>>>>>>>> 0-length array might be a  move with subtle consequences: the 
>>>>>>>>> array
>>>>>>>>> occurs within a region with some boundaries (e.g. a struct 
>>>>>>>>> region).
>>>>>>>>> The boundaries of the enclosing region are usually computed 
>>>>>>>>> using
>>>>>>>>> sizeof(enclosing type). Meaning that the enclosing region won't 
>>>>>>>>> be
>>>>>>>>> 'big enough' to host anything but a zero-length array. If we 
>>>>>>>>> cast the
>>>>>>>>> array to something else, what we get back is an array whose 
>>>>>>>>> boundaries
>>>>>>>>> would exceed those of the enclosing struct - so if you try e.g. 
>>>>>>>>> to
>>>>>>>>> write to the array, the operation would fail.
>>>>>>>>> 
>>>>>>>>> To do this stuff properly in Panama you would need to allocate 
>>>>>>>>> a
>>>>>>>>> bigger chunk of memory, of the desired size (pretty much as you 
>>>>>>>>> would
>>>>>>>>> in C), and then cast the memory to the struct type - now you 
>>>>>>>>> have a
>>>>>>>>> memory region that is big enough to do the struct + the array.
>>>>>>>>> 
>>>>>>>>> The alternative, which does look simpler, is to just allocate a 
>>>>>>>>> struct
>>>>>>>>> (with array stripped) followed by an array of desired size in 
>>>>>>>>> the same
>>>>>>>>> scope - e.g. compare this:
>>>>>>>>> 
>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>> 
>>>>>>>>>     Pointer<Byte> slab = s.allocateArray(NativeTypes.UINT8,
>>>>>>>>> Struct.sizeOf(StructWithIncompleteArray.class) + 40);
>>>>>>>>>     Pointer<StructWithIncompleteArray> pstruct =
>>>>>>>>> slab.cast(LayoutType.ofStruct(StructWithIncompleteArray.class));
>>>>>>>>>     Array<Integer> data = 
>>>>>>>>> pstruct.data$get().cast(NativeTypes.INT32.array(10));
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> With this:
>>>>>>>>> 
>>>>>>>>> try (Scope s : Scope.globalScope().fork()) {
>>>>>>>>> 
>>>>>>>>>     StructWithoutIncompleteArray struct =
>>>>>>>>> s.allocateStruct(StructWithoutIncompleteArray.class);
>>>>>>>>>     Array<Integer> data = s.allocateArray(NativeTypes.INT32, 
>>>>>>>>> 10);
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The code looks similar - and a client can, after the 
>>>>>>>>> allocation, use
>>>>>>>>> the array and the struct at will. But there's a subtle 
>>>>>>>>> difference
>>>>>>>>> between the two: in the first snippet, the array is allocated
>>>>>>>>> immediately after the struct bits - that's how allocation 
>>>>>>>>> happened. In
>>>>>>>>> the second snippet there's no guarantee that the array will be 
>>>>>>>>> after
>>>>>>>>> the struct; in fact, the native scope might have run out of 
>>>>>>>>> space with
>>>>>>>>> the struct and needed to allocate a new slab of native memory 
>>>>>>>>> via
>>>>>>>>> unsafe before the array allocation happens.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Which seems to suggest that the right way of approaching the 
>>>>>>>>> problem,
>>>>>>>>> even if more verbose, is the first one.
>>>>>>>>> 
>>>>>>>>> Maurizio


More information about the panama-dev mailing list