[nicl] RFR: Undefined struct and void*

Tue May 29 11:25:55 UTC 2018

I get that `struct S1*` and `struct S2*` are different types in the C 
language and that it would be nice for the Java type-system to reflect 
that when working with Panama.

That said, I can't help but feel that behind what looks like an innocent 
jextract refinement, there's a request for a new piece of functionality, 
namely, binder support for *abstract* structs.

That is, it has been the assumption up to now that the binder knows the 
layout of a given struct. Now we're saying that some structs do not have 
a layout, or that their layout is unspecified. Note: this is very 
different from saying that such structs have an empty layout, as I've 
been seeing mentioning other times in related discussion.

If the layout of a struct S is unspecified, a bunch of things are simply 
not possible - they all boils down to the fact that you can't compute 
`sizeof(S)` (in fact the C compiler will prevent you from doing so). As 
such, it would be impossible, in Panama, to allocate an instance of S, 
allocate an array of S, offset a pointer to S, and so forth; all 
size-related operations are simply banned.

If, on the other hand, S is simply an empty struct, all these operations 
are allowed - size is simply zero.

This means that there need to be a metadata representation for undefined 
structs that is different from that of empty structs. If an empty struct 
can be denoted as this:

```
@NativeStruct("[]")
interface EmptyStruct { }
```

How should be an undefined struct be denoted? Note that in picking the 
answer we must guarantee that all the above properties are maintained 
(e.g. no size-dependent operation is allowed).

I believe a satisfying answer, given the model put forward in [1], would 
be to use unresolved layouts - as follows:

```
@NativeStruct("$(forward)")
interface Forward { }
```

Here we are using an unresolved layout as the struct layout. Unresolved 
layouts have an important property: all size-dependent operations 
occurring on an unresolved layout trigger _resolution_; resolution means 
that the binder has to find a concrete declaration for the symbolic name 
"forward"; if jextract has not seen a definition for this struct, then 
the binder doesn't know how to resolve, meaning that any attempt to get 
the size of `$(forward)` will result in an exception.

This solves our problem nicely: not only size-dependent operation are 
outlawed (as a C programmer would expect); on top of that, should the 
binder come across a _definition_ for the "forward" struct, as follows:

```
@NativeStruct("[i32(get=i)](forward)")
interface ForwardImpl {
    int i();
}
```

The binder would be able to resolve the layout of the Forward interface 
to the one defined in ForwardImpl - meaning that clients could also 
start using Forward safely. In other words, this mechanism allows for 
some kind of 'late binding' which could come in handy in certain use 
cases (e.g. a client using multiple jextract artifacts at the same time).

And, finally, the fact that all undefined references to 'forward' indeed 
have the same layout (`$(forward)`), could also allow the binder to 
allow conversion from one forward-decl interface to another.

[1] - http://cr.openjdk.java.net/~mcimadamore/panama/panama-binder-v3.html

Maurizio

On 29/05/18 05:55, John Rose wrote:
> On May 24, 2018, at 5:12 AM, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> In terms of the new implementation in the foreign branch - these 
>> pointers do not have any LayoutType associated with them - so they're 
>> much more similar to void* than the C syntax would suggest. I agree 
>> that many C API do use this style in what Sundar calls poor-man OO 
>> style (:-)), and that having a 'real' interfaace name improves 
>> readability somewhat - but I also think that _not_ using real 
>> interface names seems a more honest approach which more accurately 
>> reflects what's under the hood.
>
> We have to reflect the semantics of the source language (C in this case)
> as well as "what's under the hood".  The "poor man OO" style amounts to
> using forward-declared struct types as named abstract types.  In the C
> programming experience, "struct S1*" and "struct S2*" are distinct types
> which cannot be converted to either other silently, and they are used
> as place-holders for struct types which may be hidden inside the library,
> or even types which are never defined.  In the latter case, the library
> casts the machine word of a "struct S1" to some other type.  In either
> case, the forward-declared type functions as an existential type, whose
> contract is "keep this type distinct from other types, even though I won't
> tell you what it contains".
>
> Throwing away type information by flattening to void* is not in the
> Panama philosophy, which seeks to retain as much as possible of
> the experience of the foreign type system when translating to the
> Java carrier types.  We can choose to do this as a conscious trade-off,
> as in the case where we were weighing a parameter-free "Pointer"
> type against a richer "Pointer<T>" type, but in general we have to
> try hard to find a way to represent distinctions from the source language
> in the extracted Java APIs.  If we can't represent them in the Java
> static type system, we must at least record them in the runtime
> metadata, so we can perform runtime checks.
>
> An empty Java interface (as Henry proposes) is the probably best way
> to choose a carrier type to represent such an existential type in C.
> It supports both static and dynamic checks in the Java APIs.
> Erasing the static type of "struct S1*" to "Pointer<Void>" probably
> also entails dropping the runtime distinction between "void" and
> "S1", which removes from Java API an important aspect of type
> safety that was designed into the C API.
>
> Since C allows the type to be declared multiple times in separate
> header files, it is necessary for jextract to issue multiple empty
> interfaces, one per jextract task (which may collect the contents
> of several headers).  The runtime has to record the fact that all
> of those empty interfaces carry the same foreign type.
>
> If in fact a jextract run encounters a struct body for such a type,
> it can generate a non-empty Java interface to carry the struct.
> In the worst case, there may be several identical bodies for
> the same struct, imported several times.  All of the Java interfaces
> extracted from the same struct type, whether forward-declared
> or actually defined, must be recorded by the binder runtime
> as the same foreign type.
>
> This implies, of course, that various interfaces (empty or not)
> must be able to convert between each other, using a runtime
> check to determine that the various instances of the struct
> type are, in fact, the same struct type.  A simple string check
> is reasonable:  You can safely convert Pointer<foo_h.S1>
> to Pointer<bar_h.S1> if the simple-name of the interface
> (empty or not) is the same, and if both interfaces are for
> structs, and both interfaces were extracted for the same
> library (a loose notion, at present).  If the binder was able
> to see all of the interfaces which translate S1, then perhaps
> there is an object which implements both foo_h.S1 and
> bar_h.S1.  In that happy case, no new instance is needed,
> just a checkcast.  In some cases, if the binder can't see all the
> types when it starts building pointers, then it may have to
> re-wrap the same machine address under different metadata.
>
> You can also safely convert from Pointer<foo_h.S1> to
> Pointer<Void>, but not vice versa.
>
> — John
>
>
>