More detail than I had intended on Layout description language.

Fri Nov 21 22:34:41 UTC 2014

On 2014-11-19, at 12:32 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
> Tobi's email refers to our attempt to implement a portable "pointer" type in the LDL. It's an interesting option to consider, but I'm not sure we want to deal with the knock-on effects of supporting this. For example, how do we align the pointer field (in the LDL) if we don't know its size? What's the impact on the layout nesting or extension features?
> 
> If we decide that LDL is not platform-independent, then "pointer" doesn't seem as useful.
> 
> > Was I correct in seeing that it is proposed to attach LDL to 
> > interfaces with annotations?
> 
> Yes, we proposed this. However, we're certainly open to alternatives.
> 
> Since LDL is not platform-independent, if the LDLs are attached to the generated Java interfaces, then the Java interfaces are also not platform-independent. Is this is good or bad?

Normally I’d say bad.

It might be interesting to try to see what happens with network protocols,
which are ultimately (at the byte-by-byte level) unambiguously specified.
That gets in the way of the platform-independent Java-behavior story,
because I can easily imagine that what you like to have happen in the generated
code is something like a loadInt, and on wrong-byte-order platforms, a byte-swap.
I was thinking “network protocols” because those are something that we really
do expect to see on more than one platform.

Alternatively, maybe we need a pair of intrinsics (potentially in sized versions) called
toBigEndian() and toLittleEndian() so the generated code could always look the same
but the behavior would vary….

Right, that will sail right past the language and corelibs guardians-of-the-faith.

I think there’s some ugliness-conservation at work here.
Perhaps we could extend the Unsafe interface to include endianness-tagged loads and stores.
(This is an application of the Alice’s Restaurant Algorithm #1,
one big pile of garbage is better than two little piles of garbage).
Another advantage of this is that some processors (Sparc, I think) have
options for swapping the data as it is loaded, so it may be expedient anyhow
to push the byteswapping as close to the loads and stores as possible.

Note that either of these options — introducing intrinsics with targeted platform-dependent
behavior, either conditional swaps or conditionally-swapped loads/stores — would at least
give us a prayer of saying that the same generated proxies would work across platforms
for network-protocol layouts.

Can we tag this as a “would be nice if possible”?

> > My snap reaction is that’s a good idea, even if it does perhaps push
> > the LDL back into text.
> > 
> > And I think (based on discussions at this end) that being able to treat these
> > things as native-resident references is priority #1, but I’m not 
> > sure that the only
> > priority.
> 
> Could you clarify the terminology "native-resident reference”? 

I think I mean “what you guys want” meaning that on the Java side there
is (at minimum) an interface full of getters/setters and a generated proxy
class, which conspire to peek/poke at memory that is “native”, meaning
not on the Java heap (i.e., allocated by malloc or mmap or similar).

This is the bare minimum for smooth copy-less interoperation with native 
code in general, I think.

For this, and for alignment, I think things get more interesting when you start
nesting things — for example, suppose you have

struct Point {
  int x,y;
}

struct Triangle {
   Point vertices[6];
}

Considering alignment, there is the natural propagation from fields to containers;
a container can never have smaller alignment than its fields (except in the case of
some exotic addressing schemes that I don’t think are normal for C programs).

If you are references-only, then it’s all aliased.  Suppose you wanted to rotate the
vertices of the triangle, say

  Triangle tri = someNativeThing.get Triangle()
  Array<Point> points = tri(); 
  Point t = ??I need to make a copy here?? (points.get(0));
  points.get(0).set(points.get(1));
  points.get(1).set(points.get(2));
  points.get(2).set(t);

So what happens to make that copy?   I think there are (at least) 3 choices
that are not even mutually exclusive:

1) Interface includes a clone() method, proxy allocates that clone
  (a) on the Java heap, using the standard Unsafe.get/set(null,…) hack
  (b) on the native heap

2) We have “values” that are different in that they are never aliased and not mutable.

3) It’s an interface, there’s nothing in particular stopping programmers from writing their own implementations.

1a) pro: will GC smoothly.
      con: cannot be passed to native code unless our wrappers are a little clever

1b) pro: can definitely pass to native code.
      con: there may be GC issues (will require finalizers to free temps?)

2) pro: will GC smoothly, will probably optimize better in Java, no aliasing issues.
    con: cannot pass to native code, multiplying entities.

3) pro: we get to be lazy and punt; we can be sure that we did not commit to a mistake.
    con: not like the total amount of work is saved by us being lazy and punting,
            programmers can screw this up.

There’s a slightly larger-picture question here, which is sort of “how far does this stuff
propagate into surrounding code before it gets a wrapper slapped on it”.

> > But I will look again.  (A bug with a deadline is currently grabbingmost of my
> > attention, unfortunately).
> > 
> > Also, I think we might need alignment specification, especially 
> > considering possible
> > GPU applications.  It’s not necessarily for padding purposes, but it
> > is required to ensure
> > that memory is properly allocated and placed — for example, can I 
> > allocate a 64-bit quantity
> > on an 8-bit boundary?  How am I told that I cannot do this?
> 
> For start alignment, I think you're right. Again, I would propose starting with an explicit descriptor attribute, i.e. "the layout must start at an alignment of x".

I think that works, but we might want to consider how alignment specs interact with everything else.
Have you guys thought much about bitfields?
There are some peculiar interactions there with alignment and endianness.

The other point I wanted to toss in, is that once you put alignment into the mix
and not just as an “oh yeah, we need to think about alignment” it might change
how we talk about some of the other stuff.  If you look at C bitfields, there is very
much a model of “load a box this big, then shift the data towards the low order
bits some distance, then mask/sign-smear the part we don’t care about”.  If that
bleeds through into the groveler tool, maybe it bleeds through into the little language,
too.  Some of the memory model issues demand the “load/store/cas-a-box" model,
too — if you treat a bitfield as something that is split across a pair of byteloads that
all gets nastier.

David