More detail than I had intended on Layout description language.

Mon Nov 10 21:57:33 UTC 2014

On 2014-10-31, at 1:21 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
> Then there are these divergent rabbit holes to follow:
> i) the tiny language specification  <-- Let's call this the "layout
> descriptor language"?
> ii) the runtime system
> 
> There might be a 3rd part that glues between the first two:
> iii) generated interfaces that are provided to the Java programmer

At this end, Henry Jen is working on a tool based on libclang that takes
header files and creates appropriate Java interfaces.  I think that the Java-
level inputs are 
(1) those interfaces
(2) combined with a layout-descriptor (written in LDL) that tells which
interface methods are getters, which are setters, and what the
sizes/locations/alignments of the various fields are.  I think the
output of the tool is still TBD.

In my experiments, I used a little string encoding, along the lines of
“foo, bar[11], baz:4,nyb:4”
and I used naming conventions and typing conventions from the supplied
interface to infer the C types.  I think this was mostly wrong.

What I think now:

The language should not be one long string (but you could talk me back
into this, I am basing this purely on the difficulty of ruling out screwball
string inputs).  I’m torn between the structure and non-ambiguity of defining
a class to describe fields and passing in an array, along the lines of (notice
how this is not Java or C, so it must not be source code…)

 NativeFieldDescriptor { // AKA NFD 
    String field_accessor_name;
    … stuff describing size/align that I am having trouble with ...
    long[] array_dimensions default=long[0] or maybe null
 }

The “stuff … I am having trouble with” is difficult for two reasons, and I think maybe
Henry will have an opinion based on what comes out of this tool, or you will have
an opinion based on your experience.

Reason #1 is that I am trying to figure out where to set the “friendly to human
programmers” dial.  I’ve become less and less worried about this, because of
reason #2 and related issues, and because we could surely define a “friendly
layer” if we needed one.

Reason #2 is that I’m trying to figure out how to deal gracefully with the endianness
issue on a platform that works very hard to hide such issues.  It seems to me that
for a given “little language” input, the behavior of the Java side (and the bytecodes
that they generate) should not depend on the endianness of the underlying platform.

The test case I’ve been playing with to try to figure out the right way to talk about
this is

struct c {
     unsigned short x:1;
     unsigned short y:7;
};

I think that the little-language generated for that has to depend on the endianness of
the underlying platform, because it generates different results at the Java side.  On a
little-endian machine, story 1 into x and y would be something along the lines of

storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003

but on a bigendian machine

storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100

So, since the Java-side behavior should be different, I think that the little language inputs
(the output of the libclang-based tool) should be different depending on platform — the
offsets will need to be described in ways that make conform to Java’s expectations.

One thing that might not be obvious is that the little language will need to specify the
size of the bit container into which bitfields are stored, because the translation between
pairs of byte offsets and short offsets differs on LE and BE machines.

For example, the little endian storeByte equivalent of

storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003

is

storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
storeByte(address+1) // store 0x00__

but big endian is

storeByte(address) // store 0x00__
storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03

So a simple offset and size formulation is not adequate; we need to know the
underlying container size for a given bit address — unless we blow everything
down to store bytes and then hack our compiler to reconsolidate adjacent stores.

Again, input from Henry would be helpful here — this container-size
information may already be present because of how C deals with bitfields
in general (a char-typed bitfield may not span a byte boundary, but a short-typed
bitfield can if it is not also a short boundary).  Another way to think of container
size is “alignment” — a struct of short-typed bitfields will have size and alignment 2
even if the bitfields all fit in a single byte.  (I don’t think we want to support full
generality here because we could end up in places where we need to understand
the underlying endianness; if we use byte loads to obtain an int, to where do
we shuffle the respective bytes?)

Is this wrong-headed?  Do we like where the “same behavior on the Java side”
rule leads us?

------------------

Other opinions I have acquired recently based on mistakes already made:

I think it is okay if we assume that the names in the LDL field descriptors
should match methods defined in the corresponding interface in relatively
obvious ways.  I think there are three accessors to consider, and maybe
we should just name them separately (and lack of a name means we
don’t want that access mode).  So 

 NativeFieldDescriptor { // AKA NFD 
    String field_getter_name;
    String field_setter_name;
    String field_reference_name;

    … stuff describing size/align that I am having trouble with …

    long[] array_dimensions default=long[0] or maybe null
 }

getters and setters would probably default to returning something like a “value”,
and reference generators would of course create proxy objects for fields.

I suspect (given the direction that I’ve gone specifying bitfield size and width and alignment)
that signedness or unsignedness of integer fields should just be a boolean in the NFD,
and as long as the type fits into the interface-method signature we use it with no
error signaled.  That still leaves the integer-size-for-unsigned issue to be resolved,
but moves it to the tool generating LDL and interfaces; i.e. we just removed a little
bit of policy from the layout engine, and that might be fine.

David