More detail than I had intended on Layout description language.

Tue Nov 18 17:31:44 UTC 2014

Hi

> The language should not be one long string (but you could talk me back
> into this, I am basing this purely on the difficulty of ruling out
screwball
> string inputs).  I?m torn between the structure and non-ambiguity of
defining
> a class to describe fields and passing in an array, along the lines of
(notice
> how this is not Java or C, so it must not be source code?)

We are open to a class based LD format. The downside to this is that the
LDL is no longer a compact representation. Our approach combines the two
Java level inputs by placing LDL annotations in the generated interfaces.
This is something we can explore further, but for the remainder of this
email I will continue to use the string notation.

> Reason #2 is that I?m trying to figure out how to deal gracefully with
the endianness
> issue on a platform that works very hard to hide such issues. It seems to
me that
> for a given ?little language? input, the behavior of the Java side (and
the bytecodes
> that they generate) should not depend on the endianness of the underlying
platform.
...
> Is this wrong-headed?  Do we like where the ?same behavior on the Java
side?
> rule leads us?

By default we assume that the layout data is the same endian as the
execution environment. We think this is what end-users would expect.
Automatic endian conversion would be nice to have, but we don't want to
bake in that cost into the JDK. Also, nothing stops us from being explicit
in the LDL about the endianness. We are certainly open to more discussion
on this topic.

In regards to, "? stuff describing size/align that I am having trouble
with ...", we think the LDL should represent how the memory is laid out
exactly. It should not make any assumptions about alignment. I will go over
some features of our prototype LDL. Hopefully that will clear up some
things.

The basic grammar notation of the LD format is the following.
"Qualified Name[ ':' Qualified Super-Class Name]" { ',' "[Field
Name]{'['Number of Elements']'} ':' (Size | 'pointer' | 'Layout' Name)"}

where:
- 'Qualified Name' is the fully qualified name of the layout being
described
    - for example "com.ibm.shapes.Square"
- 'Qualified Super-Class Name' is the fully qualified name of the super
class
- 'Field Name' is the name of the field defined in the layout
    - if the field name is omitted no accessor is generated for it
- 'Number of Elements' specifies the number of elements in an array
dimension (example later)
    - LDL assumes multidimensional arrays are arranged in row-major order
- 'Size' is the size of the field in bits
    - for arrays this is the size of a single element of the leaf type
    - special keyword 'pointer' is used for pointer fields (example later)
- 'Name' specifies name of a defined layout (example later)

The following is a basic example where we have a layout 'Point2D' with two
fields, 'x' and 'y'. The corresponding LD is shown below with the
appropriate size for the fields. The order in which the fields appear in
the Layout is preserved in the LD. As you have probably noticed, the field
type is not carried over to the LD. Each layout field is treated as a
typeless collection of bits.
Example 1:

struct Point2D {
    uint32_t x;
    uint32_t y;
}
LD: "Point2D", "x:32", "y:32"

One can define a layout with array fields by using the following notation.

Example 2:

struct A {
    uint32_t x;
    uint32_t y[10]; //this is an array of fields
}
LD:  "A", "x:32", "y[10]:32"

Here is another example with a multi-dimensional array field.

Example 3:

struct B {
    uint32_t x;
    uint32_t y[2][10]; //this is a 2-d array
}
LD:  "B", "x:32", "y[2][10]:32"

The language assumes no implicit padding or start alignment for allocators
of a type. This is a pain point for native programmers as the compiler may
add padding, and different compilers may add different padding. The
language assumes that fields are laid out in order and in a packed
representation. We will not automatically fix-up field alignment or
padding.

To illustrate this point, the following example shows a layout composed of
two fields. The first, a 'uint8_t', and the second, a 'uint32_t'. On a
32-bit platform the a compiler may add 3 byte padding to align the second
field. The padding field has to be explicitly notated in the LD, but the
name can be omitted.

Example 4:

struct C
{
    uint8_t data1;
    uint32_t data2;
}
LD: "C", "data1:8", "data2:32"

on 32 bit would be compiled as:

struct C
{
    uint8_t data1;
    uint8_t padding1[3];
    uint32_t data2;
}
LD: "C", "data1:8", ":24", "data2:32"

Native pointer types are platform specific. To address this a new keyword,
'pointer', is introduced. The size of this type depends on the JVM
architecture (32 or 64). The next example illustrates the use of pointers
in a linked list node.

Example 5:

struct Node {
    uint32_t data;
    struct Node *next;
}
LD: "Node", "data:32", "next:pointer"

This solution does have its drawbacks as it would require the alignment to
be explicitly specified in the LDL depending on JVM architecture. The next
example illustrates this. We have chosen this solution as a starting point
but it is an issue that requires more consideration.

Example 6:

on 64bit would be compiled as:
struct Node {
    uint32_t data;
    uint8_t padding1[4];
    struct Node *next;
}
LD: "Node", "data:32", ":32", "next:pointer"

Another issue to consider are floating-point types, where the native
floating-point representation may not match Java's. Generated accessors for
Layouts with float fields should give you access to the raw bits but not
automatically convert the data into a Java floating-point value.

The following example introduces a concept that was not previously
discussed before, a nesting of layouts. This poses some interesting
challenges as it introduces dependencies between layouts. If this is
something we wish to support, the following is a syntax that can be used to
describe it.

Example 7:

struct Line2D {
    Point2D start;
    Point2D end;
}
LD: "Line2D", "start:Layout Point2D", "end:Layout Point2D"

Lastly, we have an example that displays layout inheritance. If this is
something that we wish to support, here is how we could do it.

Example 8:

struct Point3D : Point2D {
     uint32_t z;
}
LD: "Point3D : Point2D", "z:32"

The proposed LD format does a good job at representing how the layout
fields appear in memory. But, it does not convey any type information about
the fields. This makes it difficult for the end-user as it requires more
effort on their part to add meaning to the fields.

The ideal solution is one that can represent these two streams of data,
either in a LDL or NFD format.
Type Data
- integral type
    - signed vs unsigned
- float type
- pointer type
- raw type
    - if we need to represent data in Java without any type info associated
with it

Layout Data
- name
- size
- offset

Tobi