considering the LDL

Fri Jan 15 21:08:59 UTC 2016

Hi John

We haven't had time to analyze the entirety of your last post as thoroughly
as
we would like but here is what we think so far.

> So, should language-specific names or types be in the
> LDL?  Probably not, but (as you found) perhaps we need
> some sort of naming if only to provide attachment points
> to other metadata elsewhere.  What exactly are the names
> in a LD?  Must they map to some source language, or
> are they abstract link anchors to connect to other schemas?
> (If they are just anchors, can we drop the names and
> use just small integers?)

At the very least we need some kind of mapping between names in native
structure definitions and their members to the names of interface methods.
Names are not essential to the LDL as they don't provide any layout
information. However, it is important that it's clear how the accessors
match
up to the LD.

The LDL of our current prototype makes this mapping very clear as seen
below.
But perhaps this is a little extreme.

//struct Line {
//	struct Point start;
//	struct Point end;
//}
@LayoutDesc({"start:Point:8","end:Point:8"})
public interface Line extends Layout {
 public long sizeof();

 public abstract Point start();

 public abstract Point end();
}

Since the names 'Point', 'x', 'y', 'jint' are already stored in the
generated
interfaces, they could be omitted leaving us with:

public interface Line extends Layout {
 public long sizeof();

 @MemberDesc({"8","0"}) //@MemberDesc({size, offset})
 public abstract Point start();

 @MemberDesc({"8","8"})
 public abstract Point end();
}

Dropping names would require that the LD also encodes offsets as we can no
longer trust the method order in case of refactoring. Even with this, the
approach is susceptible to user errors as there are more pieces of
information
to deal with when modifying a layout. For example, manually adding a new
field
would require one to update all the subsequent offsets.

> Do we need separate concepts for on-heap and off-heap
> or Java and native storage (I hope not)?
>

No, I don't think these should be separate concepts. However, they would
have
separate security concerns. API for creating an on-heap vs off-heap layout
would likely differ, but the interface for using these should be the same.

> What about bit order?  I hope we don't need to put
> this in the LDL, but the above argument would seem
> to apply to bits also.  Bit-streaming formats tend to
> make arbitrary choices about bit order just like CPUs
> are arbitrary about byte order.  If we add the
> "nice-to-have" feature of bit slices, I think we will
> have to add bit-order notation also.
>

Agreed, being explicit about the bit-order is required if we want to
support
bit-slices. The way we approached this was by defining a bit ordering
format
(we chose LSB) that would be used in all cases. MSB sequences were just
ordered
in reverse and that saves us from having a special notation. But this still
leaves issues regarding atomicity to be dealt with. The common use case
suggested for layouts are network formats which often have definitions at
the
bit level. Having a compelling answer for this would certainly help
layouts.

Even if Layouts don't support data access at the bit-slice level, the
ordering
of the bits must be specified so that Java can operate on the data
correctly
after is it loaded.

> Given several layouts and a composition rule (union,
> struct, array, etc.) alignments can inform offset generation
> and an overall alignment for a combined layout.
> This is what C structs do.  That leads us to…
>
<snip>
> A layout combination operator would take two layouts
> and create a composite one, assigning new offsets
> for each element in its new position in the composite.
> When combined with alignment restrictions, we start
> to see a demand for padding.  The rules get suspiciously
> close to complex and arbitrary.
>
> I'd like to propose a simpler, less structured composition
> operator, a "nest" (taking a cue from "nested layout"
> in your draft).  A "nest" is a sequence of one or more
> layouts, each associated with a non-negative offset.
<snip>
> The nest becomes more interesting if you add more
> constraints on formation.  For example, the alignment
> of the parent must satisfy the alignment of all the children,
> in the obvious way.  Imposing this constraint will force
> C-like structs to "grow" padding, if their children have alignment
> constraints.  Or it will force the children to lose their constraints.
> There's no one right answer, so it has to be an option.
>
> I like the fact that there are no auto-padding rules in your
> draft.  But in order to allow code generators to trust alignments,
> I think we need rules for composition that preserve alignment
> information in the way sketched above.
>
> Basically, if you are putting together child layouts that have
> alignment, you need to get the offsets right.  If you make
> a mistake, the LDL constraints will tell you it is wrong, but
> they won't fix it up for you behind your back.
>

I think the common scenario involves running a groveler (jextract) on
native
structure definitions and outputting LD + interface. In this case we can
expect
that all compositions have proper alignment (and padding) as the compiler
takes
care of this. At the very least our validation needs to be compatible with
native structure definitions which may deliberately misalign struct
members. If
the true alignment of the data is misaligned then the validation should not
reject it.

What level of validation are you proposing? Will they be warnings or
exceptions?

> == types
>
> Are registers just bundles of bits and bytes, or do they have kinds?
> If they don't have kinds, then the LDL can remain silent about
> kinds also, since memory blocks don't have kinds either.
> I'd prefer it this way; leave it to the register allocator and
> code generator to pick register kinds.
>
> As noted before, we don't want to sign up to create a
> LTS (little type system) in the LDL.  That's a job for
> other notations like DWARF or C itself.
>

Agreed, and it is impossible to create a type system that is a superset of
every type in every language. However, for usability we need to associate
the
data with some kind of java entity, I assume this is what your 'carrier
type'
is intended for.

Also, it would be useful to represent types that do not fit in any java
primitives in a generic language independent way.

>
>    struct TwoArrays { int n1, n2; double a1[n1]; byte a2[n2]; };
>
> There are two problems here:  The offset of a2 is a complex
> expression, and the size of the whole is another expression.
>
> In a parameterized LDL formula, TwoArrays can be like this:
>
>   template<int $n1, int $n2>
>   struct TwoArrays { int n1, n2; double a1[$n1]; byte a2[$n2]; };
>
> A code generator would need to be passed $n1 and $n2.
> (Extra cleverness would allow it to fetch them from n1 and n2,
> optionally.)
> More generally, all the numeric parts of an LDL formula could
> be either constants or variables (where the set of variables
> is declared at the beginning of the formula, like a lambda
> or a template).  This seems overly general; I'm not sure what
> is the right reduced-strength design.
>
> In the end, it's clear we need to support a modest amount of
> runtime binding for offsets (of array elements), but we can't
> sign up to prove properties like "non-overlapping", or "tightly
> packed" or even "properly aligned", if a runtime-sized array
> has gotten in somewhere.  Probably at some point you have
> to say to the static code generator "trust me, you won't
> run off the end", or maybe "here are some numbers you can
> use to check my math".  Then efficient math-checking
> becomes the JIT's problem; that's how Java does it.
>

We have also been working on an approach to describe variable sized arrays
in
layouts. The direction we have gone involves providing API for cases where
one
is authoring and consuming data. We would have API that allows one to input
initial values for array headers (n1 and n2), as well as API that binds to
a
location with the expectation that the header values already exists in
memory.

While we can't guarantee the safety of native methods, we should guarantee
that
the java side always passes correct arguments (correct size and type) to
the
method. Ideally, we should also attempt to catch out-of-bounds memory
writes by
native methods, rather than allowing unpredictable corruption of memory.

We are currently working on a document for this particular issue but it is
not
ready yet.

--Tobi