From david.r.chase at oracle.com  Thu Nov  6 21:58:45 2014
From: david.r.chase at oracle.com (David Chase)
Date: Thu, 6 Nov 2014 16:58:45 -0500
Subject: Reference SUB/SEP Value question
In-Reply-To: <OF8D24377F.28396A7E-ON85257D82.00538498-85257D82.005F51A1@ca.ibm.com>
References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com>
	<OF8D24377F.28396A7E-ON85257D82.00538498-85257D82.005F51A1@ca.ibm.com>
Message-ID: <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com>

Here?s a first stab at clarifying the problem.
We might need to stab it some more before we are done.

>> 3) I spent some time experimenting with the relationship between a
>> so-called ?Ref? type and a so-called
>> ?Value? type, thought I had some answers, but after whacking on the
> 
> I don't quite understand 3).  Would it help to post some javadoc/test code
> snippets to illustrate?  If you haven't got this handy, we can scrounge
> something out of our prototypes.

I think there are two choices, refSUBvalue and refSEPvalue.

For SUB, you might have a hierarchy that looks like this:

  interface ComplexValue
     class Complex implements ComplexValue
  interface ComplexRef extends ComplexValue, Ref
     class ComplexGeneratedProxy implements ComplexRef

You can do this because a Reference can support all the
methods of Value by fetching the relevant fields etc, plus
it has some more methods for setting those fields.

For SEP, you might have a hierarchy that looks like this:

  interface ComplexValue
     class Complex implements ComplexValue

  interface ComplexRef extends Ref {
     ComplexValue get(); // canonical connection between Ref and Value
  }
     class ComplexGeneratedProxy implements ComplexRef

The main difference is that in the refSUBvalue world, every ref is
implicitly also a value, and that might be convenient, but I think it
has the two problems of tricking people into shooting themselves
in the foot with aliasing, and that ComplexValue is likely to be an
interface with at least two implementations, hence not falling-off-a-log
easy to optimize with ClassHierarchyAnalysis.

In the world where they are separate, the native programmer will
be forced to notice the difference between refs and values, but
ComplexRef (in particular) is very likely to be a single-implementation
interface (the proxy class) and thus will be easy to optimize.
On the other hand, getting your hands on a value will require
a second memory allocation (but perhaps it will be a short-lived
memory allocation amenable to escape analysis).

I think SEP also helps us avoid some overloaded-method-selection
surprises.

TBD are the exact choices for field names in Ref and Value interfaces
when they are separate; I am inclined to give Refs getter methods
that have the same name and signature as those in the Value interface
(hence, an implicit dereference ? is this inconsistency okay?)

Did I at least explain what I think the issues are?
I need to test some coding in both models (I?ve tested ?sub?)
to see how they compare.

David


From david.r.chase at oracle.com  Mon Nov 10 21:57:33 2014
From: david.r.chase at oracle.com (David Chase)
Date: Mon, 10 Nov 2014 16:57:33 -0500
Subject: More detail than I had intended on Layout description language.
Message-ID: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com>


On 2014-10-31, at 1:21 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
> Then there are these divergent rabbit holes to follow:
> i) the tiny language specification  <-- Let's call this the "layout
> descriptor language"?
> ii) the runtime system
> 
> There might be a 3rd part that glues between the first two:
> iii) generated interfaces that are provided to the Java programmer

At this end, Henry Jen is working on a tool based on libclang that takes
header files and creates appropriate Java interfaces.  I think that the Java-
level inputs are 
(1) those interfaces
(2) combined with a layout-descriptor (written in LDL) that tells which
interface methods are getters, which are setters, and what the
sizes/locations/alignments of the various fields are.  I think the
output of the tool is still TBD.

In my experiments, I used a little string encoding, along the lines of
?foo, bar[11], baz:4,nyb:4?
and I used naming conventions and typing conventions from the supplied
interface to infer the C types.  I think this was mostly wrong.

What I think now:

The language should not be one long string (but you could talk me back
into this, I am basing this purely on the difficulty of ruling out screwball
string inputs).  I?m torn between the structure and non-ambiguity of defining
a class to describe fields and passing in an array, along the lines of (notice
how this is not Java or C, so it must not be source code?)

 NativeFieldDescriptor { // AKA NFD 
    String field_accessor_name;
    ? stuff describing size/align that I am having trouble with ...
    long[] array_dimensions default=long[0] or maybe null
 }

The ?stuff ? I am having trouble with? is difficult for two reasons, and I think maybe
Henry will have an opinion based on what comes out of this tool, or you will have
an opinion based on your experience.

Reason #1 is that I am trying to figure out where to set the ?friendly to human
programmers? dial.  I?ve become less and less worried about this, because of
reason #2 and related issues, and because we could surely define a ?friendly
layer? if we needed one.

Reason #2 is that I?m trying to figure out how to deal gracefully with the endianness
issue on a platform that works very hard to hide such issues.  It seems to me that
for a given ?little language? input, the behavior of the Java side (and the bytecodes
that they generate) should not depend on the endianness of the underlying platform.

The test case I?ve been playing with to try to figure out the right way to talk about
this is

struct c {
     unsigned short x:1;
     unsigned short y:7;
};

I think that the little-language generated for that has to depend on the endianness of
the underlying platform, because it generates different results at the Java side.  On a
little-endian machine, story 1 into x and y would be something along the lines of

storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003

but on a bigendian machine

storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100

So, since the Java-side behavior should be different, I think that the little language inputs
(the output of the libclang-based tool) should be different depending on platform ? the
offsets will need to be described in ways that make conform to Java?s expectations.

One thing that might not be obvious is that the little language will need to specify the
size of the bit container into which bitfields are stored, because the translation between
pairs of byte offsets and short offsets differs on LE and BE machines.

For example, the little endian storeByte equivalent of

storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003

is

storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
storeByte(address+1) // store 0x00__

but big endian is

storeByte(address) // store 0x00__
storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03

So a simple offset and size formulation is not adequate; we need to know the
underlying container size for a given bit address ? unless we blow everything
down to store bytes and then hack our compiler to reconsolidate adjacent stores.

Again, input from Henry would be helpful here ? this container-size
information may already be present because of how C deals with bitfields
in general (a char-typed bitfield may not span a byte boundary, but a short-typed
bitfield can if it is not also a short boundary).  Another way to think of container
size is ?alignment? ? a struct of short-typed bitfields will have size and alignment 2
even if the bitfields all fit in a single byte.  (I don?t think we want to support full
generality here because we could end up in places where we need to understand
the underlying endianness; if we use byte loads to obtain an int, to where do
we shuffle the respective bytes?)

Is this wrong-headed?  Do we like where the ?same behavior on the Java side?
rule leads us?

------------------

Other opinions I have acquired recently based on mistakes already made:

I think it is okay if we assume that the names in the LDL field descriptors
should match methods defined in the corresponding interface in relatively
obvious ways.  I think there are three accessors to consider, and maybe
we should just name them separately (and lack of a name means we
don?t want that access mode).  So 

 NativeFieldDescriptor { // AKA NFD 
    String field_getter_name;
    String field_setter_name;
    String field_reference_name;

    ? stuff describing size/align that I am having trouble with ?

    long[] array_dimensions default=long[0] or maybe null
 }

getters and setters would probably default to returning something like a ?value?,
and reference generators would of course create proxy objects for fields.

I suspect (given the direction that I?ve gone specifying bitfield size and width and alignment)
that signedness or unsignedness of integer fields should just be a boolean in the NFD,
and as long as the type fits into the interface-method signature we use it with no
error signaled.  That still leaves the integer-size-for-unsigned issue to be resolved,
but moves it to the tool generating LDL and interfaces; i.e. we just removed a little
bit of policy from the layout engine, and that might be fine.

David


From henry.jen at oracle.com  Tue Nov 11 01:35:09 2014
From: henry.jen at oracle.com (Henry Jen)
Date: Mon, 10 Nov 2014 17:35:09 -0800
Subject: More detail than I had intended on Layout description language.
In-Reply-To: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com>
References: <7F2A3A29-CDE1-4AC4-8B21-ACC7874B9A66@oracle.com>
Message-ID: <546167CD.8000403@oracle.com>

On 11/10/2014 01:57 PM, David Chase wrote:
>
> On 2014-10-31, at 1:21 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
>> Then there are these divergent rabbit holes to follow:
>> i) the tiny language specification  <-- Let's call this the "layout
>> descriptor language"?
>> ii) the runtime system
>>
>> There might be a 3rd part that glues between the first two:
>> iii) generated interfaces that are provided to the Java programmer
>
> The test case I?ve been playing with to try to figure out the right way to talk about
> this is
>
> struct c {
>       unsigned short x:1;
>       unsigned short y:7;
> };
>
> I think that the little-language generated for that has to depend on the endianness of
> the underlying platform, because it generates different results at the Java side.  On a
> little-endian machine, story 1 into x and y would be something along the lines of
>
> storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003
>
> but on a bigendian machine
>
> storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100
>
> So, since the Java-side behavior should be different, I think that the little language inputs
> (the output of the libclang-based tool) should be different depending on platform ? the
> offsets will need to be described in ways that make conform to Java?s expectations.
>
> One thing that might not be obvious is that the little language will need to specify the
> size of the bit container into which bitfields are stored, because the translation between
> pairs of byte offsets and short offsets differs on LE and BE machines.
>
> For example, the little endian storeByte equivalent of
>
> storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003
>
> is
>
> storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
> storeByte(address+1) // store 0x00__
>
> but big endian is
>
> storeByte(address) // store 0x00__
> storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
>
> So a simple offset and size formulation is not adequate; we need to know the
> underlying container size for a given bit address ? unless we blow everything
> down to store bytes and then hack our compiler to reconsolidate adjacent stores.
>
> Again, input from Henry would be helpful here ? this container-size
> information may already be present because of how C deals with bitfields
> in general (a char-typed bitfield may not span a byte boundary, but a short-typed
> bitfield can if it is not also a short boundary).  Another way to think of container
> size is ?alignment? ? a struct of short-typed bitfields will have size and alignment 2
> even if the bitfields all fit in a single byte.  (I don?t think we want to support full
> generality here because we could end up in places where we need to understand
> the underlying endianness; if we use byte loads to obtain an int, to where do
> we shuffle the respective bytes?)
>

We need to consider when/where the libclang-based tool is used, that 
dictates whether we can rely on libclang for such information. The 
information is available for a target platform given a header file.

It is possible to build some sort of abstraction layer to describe 
layout that can cover general cases, we will see how the experiment goes.

Cheers,
Henry


From angela_lin at ca.ibm.com  Wed Nov 12 14:49:39 2014
From: angela_lin at ca.ibm.com (Angela Lin)
Date: Wed, 12 Nov 2014 09:49:39 -0500
Subject: Reference SUB/SEP Value question
In-Reply-To: <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com>
References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com>
	<OF8D24377F.28396A7E-ON85257D82.00538498-85257D82.005F51A1@ca.ibm.com>
	<4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com>
Message-ID: <OF30FEDDB9.545DE2A2-ON85257D8D.007820FB-85257D8E.00517342@ca.ibm.com>


Have I got this straight?

- ComplexGeneratedProxy is the runtime-generated thing that knows how to
pull structured data out of some memory.
- ComplexValue is the interface generated by the groveller. This is the
type that the Java programmer can use directly?
- Complex is the layout that is returned from the layout factory.  This is
also generated at runtime?
- Ref supplies base behaviour for all layouts.

Why the need for ComplexRef and ComplexValue to be separate entities?  What
are the methods of each?

Thanks,
Angela

David Chase <david.r.chase at oracle.com> wrote on 11/06/2014 04:58:45 PM:

> From: David Chase <david.r.chase at oracle.com>
> To: Angela Lin/Ottawa/IBM at IBMCA
> Cc: panama-spec-experts at openjdk.java.net
> Date: 11/06/2014 04:58 PM
> Subject: Reference SUB/SEP Value question
>
> Here?s a first stab at clarifying the problem.
> We might need to stab it some more before we are done.
>
> >> 3) I spent some time experimenting with the relationship between a
> >> so-called ?Ref? type and a so-called
> >> ?Value? type, thought I had some answers, but after whacking on the
> >
> > I don't quite understand 3).  Would it help to post some javadoc/test
code
> > snippets to illustrate?  If you haven't got this handy, we can scrounge
> > something out of our prototypes.
>
> I think there are two choices, refSUBvalue and refSEPvalue.
>
> For SUB, you might have a hierarchy that looks like this:
>
>   interface ComplexValue
>      class Complex implements ComplexValue
>   interface ComplexRef extends ComplexValue, Ref
>      class ComplexGeneratedProxy implements ComplexRef
>
> You can do this because a Reference can support all the
> methods of Value by fetching the relevant fields etc, plus
> it has some more methods for setting those fields.
>
> For SEP, you might have a hierarchy that looks like this:
>
>   interface ComplexValue
>      class Complex implements ComplexValue
>
>   interface ComplexRef extends Ref {
>      ComplexValue get(); // canonical connection between Ref and Value
>   }
>      class ComplexGeneratedProxy implements ComplexRef
>

From david.r.chase at oracle.com  Wed Nov 12 19:44:17 2014
From: david.r.chase at oracle.com (David Chase)
Date: Wed, 12 Nov 2014 14:44:17 -0500
Subject: Reference SUB/SEP Value question
In-Reply-To: <OF30FEDDB9.545DE2A2-ON85257D8D.007820FB-85257D8E.00517342@ca.ibm.com>
References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com>
	<OF8D24377F.28396A7E-ON85257D82.00538498-85257D82.005F51A1@ca.ibm.com>
	<4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com>
	<OF30FEDDB9.545DE2A2-ON85257D8D.007820FB-85257D8E.00517342@ca.ibm.com>
Message-ID: <AE51C2EE-F3DC-496D-B11E-E674622C05D6@oracle.com>


On 2014-11-12, at 9:49 AM, Angela Lin <angela_lin at ca.ibm.com> wrote:

> Have I got this straight?
> 
> - ComplexGeneratedProxy is the runtime-generated thing that knows how to pull structured data out of some memory.
> - ComplexValue is the interface generated by the groveller. This is the type that the Java programmer can use directly?

I think the Groveler creates both ComplexValue and ComplexRef, plus a little language description of the layout.
Maybe that?s wrong ? perhaps the little language should be encoded into static methods of the ComplexRef type.
(Another option, besides strings or data structures).

I?m not 100% sure of the need for ComplexValue.  I know we need ComplexRef and the generated proxy class.

> - Complex is the layout that is returned from the layout factory.  This is also generated at runtime?

I think that the layout factory returns a ComplexGeneratedProxy which is always a ComplexRef.

> - Ref supplies base behaviour for all layouts.

Yes ? a lot of it common to the unsafe machinery at the bottom, like ?address? and ?size? and ?align?.

> Why the need for ComplexRef and ComplexValue to be separate entities?  What are the methods of each?

ComplexRef has more methods, in particular it has setter methods and may have ref methods to its own fields.
ComplexValue only has getters.

We were discussing this last night, and I don?t feel certain of my opinions,
and that?s why I laid out both ?SEP? and ?SUB? hierarchies.

The problem I?m trying to sort out are the tradeoffs between too darn many entities,
versus what we know programmers will want to do with them,
versus the downward-aimed aliasing firearms that we hand out to programmers.
Making FooRef subtype FooValue guarantees value-aliasing problems; separating
them allows the implementation to create them in the obvious way when you ask
a FooRef for its FooValue, but does not require it.

It may be the case that we want to make a cleaner distinction between those things that
are ?values? and those that are ?references?, and not have anything be both.
So for example, maybe we regard Complex as a ?value? in which case there is
no way to separately set the two fields within the layout ? you can get them,
you can get the whole Complex as a ?value?, but you cannot set them separately.

-> this implies a not quite the same ComplexGeneratedProxy, since it will have no setters.
-> and maybe we need a name for the ?value? type that you get from dereferencing such
a thing.

But for (say) in IP header, you probably do want to monkey with some of the fields in place,
so it would be an IPHeaderRef (need to see the setter methods) and the IPHeaderGeneratedProxy
would also implement those.  Maybe don?t ever need to regard the IPHeaderRef as if it were
a ?value?.

Have I stated my problem clearly enough?

David


> Thanks,
> Angela
> 
> David Chase <david.r.chase at oracle.com> wrote on 11/06/2014 04:58:45 PM:
> 
> > From: David Chase <david.r.chase at oracle.com>
> > To: Angela Lin/Ottawa/IBM at IBMCA
> > Cc: panama-spec-experts at openjdk.java.net
> > Date: 11/06/2014 04:58 PM
> > Subject: Reference SUB/SEP Value question
> > 
> > Here?s a first stab at clarifying the problem.
> > We might need to stab it some more before we are done.
> > 
> > >> 3) I spent some time experimenting with the relationship between a
> > >> so-called ?Ref? type and a so-called
> > >> ?Value? type, thought I had some answers, but after whacking on the
> > > 
> > > I don't quite understand 3).  Would it help to post some javadoc/test code
> > > snippets to illustrate?  If you haven't got this handy, we can scrounge
> > > something out of our prototypes.
> > 
> > I think there are two choices, refSUBvalue and refSEPvalue.
> > 
> > For SUB, you might have a hierarchy that looks like this:
> > 
> >   interface ComplexValue
> >      class Complex implements ComplexValue
> >   interface ComplexRef extends ComplexValue, Ref
> >      class ComplexGeneratedProxy implements ComplexRef
> > 
> > You can do this because a Reference can support all the
> > methods of Value by fetching the relevant fields etc, plus
> > it has some more methods for setting those fields.
> > 
> > For SEP, you might have a hierarchy that looks like this:
> > 
> >   interface ComplexValue
> >      class Complex implements ComplexValue
> > 
> >   interface ComplexRef extends Ref {
> >      ComplexValue get(); // canonical connection between Ref and Value
> >   }
> >      class ComplexGeneratedProxy implements ComplexRef
> > 
> 


From atobia at ca.ibm.com  Tue Nov 18 17:31:44 2014
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Tue, 18 Nov 2014 12:31:44 -0500
Subject: More detail than I had intended on Layout description
	language.
In-Reply-To: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
References: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
Message-ID: <OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>


Hi

> The language should not be one long string (but you could talk me back
> into this, I am basing this purely on the difficulty of ruling out
screwball
> string inputs).  I?m torn between the structure and non-ambiguity of
defining
> a class to describe fields and passing in an array, along the lines of
(notice
> how this is not Java or C, so it must not be source code?)

We are open to a class based LD format. The downside to this is that the
LDL is no longer a compact representation. Our approach combines the two
Java level inputs by placing LDL annotations in the generated interfaces.
This is something we can explore further, but for the remainder of this
email I will continue to use the string notation.

> Reason #2 is that I?m trying to figure out how to deal gracefully with
the endianness
> issue on a platform that works very hard to hide such issues. It seems to
me that
> for a given ?little language? input, the behavior of the Java side (and
the bytecodes
> that they generate) should not depend on the endianness of the underlying
platform.
...
> Is this wrong-headed?  Do we like where the ?same behavior on the Java
side?
> rule leads us?

By default we assume that the layout data is the same endian as the
execution environment. We think this is what end-users would expect.
Automatic endian conversion would be nice to have, but we don't want to
bake in that cost into the JDK. Also, nothing stops us from being explicit
in the LDL about the endianness. We are certainly open to more discussion
on this topic.

In regards to, "? stuff describing size/align that I am having trouble
with ...", we think the LDL should represent how the memory is laid out
exactly. It should not make any assumptions about alignment. I will go over
some features of our prototype LDL. Hopefully that will clear up some
things.

The basic grammar notation of the LD format is the following.
"Qualified Name[ ':' Qualified Super-Class Name]" { ',' "[Field
Name]{'['Number of Elements']'} ':' (Size | 'pointer' | 'Layout' Name)"}

where:
- 'Qualified Name' is the fully qualified name of the layout being
described
    - for example "com.ibm.shapes.Square"
- 'Qualified Super-Class Name' is the fully qualified name of the super
class
- 'Field Name' is the name of the field defined in the layout
    - if the field name is omitted no accessor is generated for it
- 'Number of Elements' specifies the number of elements in an array
dimension (example later)
    - LDL assumes multidimensional arrays are arranged in row-major order
- 'Size' is the size of the field in bits
    - for arrays this is the size of a single element of the leaf type
    - special keyword 'pointer' is used for pointer fields (example later)
- 'Name' specifies name of a defined layout (example later)

The following is a basic example where we have a layout 'Point2D' with two
fields, 'x' and 'y'. The corresponding LD is shown below with the
appropriate size for the fields. The order in which the fields appear in
the Layout is preserved in the LD. As you have probably noticed, the field
type is not carried over to the LD. Each layout field is treated as a
typeless collection of bits.
Example 1:

struct Point2D {
    uint32_t x;
    uint32_t y;
}
LD: "Point2D", "x:32", "y:32"

One can define a layout with array fields by using the following notation.

Example 2:

struct A {
    uint32_t x;
    uint32_t y[10]; //this is an array of fields
}
LD:  "A", "x:32", "y[10]:32"

Here is another example with a multi-dimensional array field.

Example 3:

struct B {
    uint32_t x;
    uint32_t y[2][10]; //this is a 2-d array
}
LD:  "B", "x:32", "y[2][10]:32"

The language assumes no implicit padding or start alignment for allocators
of a type. This is a pain point for native programmers as the compiler may
add padding, and different compilers may add different padding. The
language assumes that fields are laid out in order and in a packed
representation. We will not automatically fix-up field alignment or
padding.

To illustrate this point, the following example shows a layout composed of
two fields. The first, a 'uint8_t', and the second, a 'uint32_t'. On a
32-bit platform the a compiler may add 3 byte padding to align the second
field. The padding field has to be explicitly notated in the LD, but the
name can be omitted.

Example 4:

struct C
{
    uint8_t data1;
    uint32_t data2;
}
LD: "C", "data1:8", "data2:32"

on 32 bit would be compiled as:

struct C
{
    uint8_t data1;
    uint8_t padding1[3];
    uint32_t data2;
}
LD: "C", "data1:8", ":24", "data2:32"

Native pointer types are platform specific. To address this a new keyword,
'pointer', is introduced. The size of this type depends on the JVM
architecture (32 or 64). The next example illustrates the use of pointers
in a linked list node.

Example 5:

struct Node {
    uint32_t data;
    struct Node *next;
}
LD: "Node", "data:32", "next:pointer"

This solution does have its drawbacks as it would require the alignment to
be explicitly specified in the LDL depending on JVM architecture. The next
example illustrates this. We have chosen this solution as a starting point
but it is an issue that requires more consideration.

Example 6:

on 64bit would be compiled as:
struct Node {
    uint32_t data;
    uint8_t padding1[4];
    struct Node *next;
}
LD: "Node", "data:32", ":32", "next:pointer"

Another issue to consider are floating-point types, where the native
floating-point representation may not match Java's. Generated accessors for
Layouts with float fields should give you access to the raw bits but not
automatically convert the data into a Java floating-point value.

The following example introduces a concept that was not previously
discussed before, a nesting of layouts. This poses some interesting
challenges as it introduces dependencies between layouts. If this is
something we wish to support, the following is a syntax that can be used to
describe it.

Example 7:

struct Line2D {
    Point2D start;
    Point2D end;
}
LD: "Line2D", "start:Layout Point2D", "end:Layout Point2D"

Lastly, we have an example that displays layout inheritance. If this is
something that we wish to support, here is how we could do it.

Example 8:

struct Point3D : Point2D {
     uint32_t z;
}
LD: "Point3D : Point2D", "z:32"

The proposed LD format does a good job at representing how the layout
fields appear in memory. But, it does not convey any type information about
the fields. This makes it difficult for the end-user as it requires more
effort on their part to add meaning to the fields.

The ideal solution is one that can represent these two streams of data,
either in a LDL or NFD format.
Type Data
- integral type
    - signed vs unsigned
- float type
- pointer type
- raw type
    - if we need to represent data in Java without any type info associated
with it

Layout Data
- name
- size
- offset

Tobi

From angela_lin at ca.ibm.com  Tue Nov 18 22:31:00 2014
From: angela_lin at ca.ibm.com (Angela Lin)
Date: Tue, 18 Nov 2014 17:31:00 -0500
Subject: More detail than I had intended on Layout
	description	language.
In-Reply-To: <OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>
References: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
	<OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>
Message-ID: <OF6E773A02.E89417C9-ON85257D94.007A0C94-85257D94.007BB024@ca.ibm.com>


Some aspects I'd like to highlight from Tobi's email:

1. LDL/NFD, by default, should describe exact memory layout on the
execution platform. In particular:
- Fields must have explicitly specified width. So, if the LDL/NFD models a
C struct that contains a void*, the size of the void* has to be known
up-front.
- No automatic padding or alignment.
- Fields either have explicitly specified offset, or are laid out in the
order of specification.
- Native endian.

This implies that the descriptor for, say, a C struct, may not be portable
across all platforms. If the descriptor models a data structure that is
shared with a native library, then the descriptor is tied to the particular
native library binary. I think this is consistent with David's earlier
statement:
> So, since the Java-side behavior should be different, I think that
> the little language inputs
> (the output of the libclang-based tool) should be different
> depending on platform ? the
> offsets will need to be described in ways that make conform to
> Java?s expectations.

User-friendly descriptor options could certainly be added, as long as we
agree on the default behaviour.

2. Layouts should offer access to the raw field bits by value and
reference, without conversion to a Java type (primitive int, float, or
otherwise). We'd like to be able to pass a field to a library that knows
how to handle the non-Java data type, and we'd like to avoid marshalling
costs.

3. The descriptor should provide a way to specify how the raw field bits
should be interpreted as a Java entity. This could be as simple as casting
the field bits to a suitably sized Java integer type.

A more structured descriptor language does provide more flexibility for
doing the above.

Angela

From david.r.chase at oracle.com  Tue Nov 18 23:28:35 2014
From: david.r.chase at oracle.com (David Chase)
Date: Tue, 18 Nov 2014 18:28:35 -0500
Subject: More detail than I had intended on Layout description language.
In-Reply-To: <OF6E773A02.E89417C9-ON85257D94.007A0C94-85257D94.007BB024@ca.ibm.com>
References: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
	<OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>
	<OF6E773A02.E89417C9-ON85257D94.007A0C94-85257D94.007BB024@ca.ibm.com>
Message-ID: <DBE7EDB6-CF00-4796-8F60-4F432E3016A2@oracle.com>


On 2014-11-18, at 5:31 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:

> 
> Some aspects I'd like to highlight from Tobi's email:
> 
> 1. LDL/NFD, by default, should describe exact memory layout on the
> execution platform. In particular:
> - Fields must have explicitly specified width. So, if the LDL/NFD models a
> C struct that contains a void*, the size of the void* has to be known
> up-front.
> - No automatic padding or alignment.
> - Fields either have explicitly specified offset, or are laid out in the
> order of specification.
> - Native endian.
> 
> This implies that the descriptor for, say, a C struct, may not be portable
> across all platforms. If the descriptor models a data structure that is
> shared with a native library, then the descriptor is tied to the particular
> native library binary. I think this is consistent with David's earlier
> statement:
>> So, since the Java-side behavior should be different, I think that
>> the little language inputs
>> (the output of the libclang-based tool) should be different
>> depending on platform ? the
>> offsets will need to be described in ways that make conform to
>> Java?s expectations.

I think what you are saying is consistent, but I didn?t get that from Tobi?s email,
so I?ll reread it more carefully.  The language surrounding ?pointer? in particular
made me think that he was approaching it differently.

Was I correct in seeing that it is proposed to attach LDL to interfaces with annotations?
My snap reaction is that?s a good idea, even if it does perhaps push the LDL back into text.

And I think (based on discussions at this end) that being able to treat these
things as native-resident references is priority #1, but I?m not sure that the only
priority.

But I will look again.  (A bug with a deadline is currently grabbing most of my
attention, unfortunately).

Also, I think we might need alignment specification, especially considering possible
GPU applications.  It?s not necessarily for padding purposes, but it is required to ensure
that memory is properly allocated and placed ? for example, can I allocate a 64-bit quantity
on an 8-bit boundary?  How am I told that I cannot do this?

David


From angela_lin at ca.ibm.com  Wed Nov 19 17:32:09 2014
From: angela_lin at ca.ibm.com (Angela Lin)
Date: Wed, 19 Nov 2014 12:32:09 -0500
Subject: More detail than I had intended on Layout description language.
In-Reply-To: <DBE7EDB6-CF00-4796-8F60-4F432E3016A2@oracle.com>
References: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
	<OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>
	<OF6E773A02.E89417C9-ON85257D94.007A0C94-85257D94.007BB024@ca.ibm.com>
	<DBE7EDB6-CF00-4796-8F60-4F432E3016A2@oracle.com>
Message-ID: <OFF81978F7.E6D73E09-ON85257D95.004F54EE-85257D95.006053D7@ca.ibm.com>


David Chase <david.r.chase at oracle.com> wrote on 11/18/2014 06:28:35 PM:

> From: David Chase <david.r.chase at oracle.com>
> To: Angela Lin/Ottawa/IBM at IBMCA
> Cc: panama-spec-experts at openjdk.java.net, IBM Panama Spec Group
> <IBM_Panama_Spec_Group%IBMCA at ca.ibm.com>
> Date: 11/18/2014 06:28 PM
> Subject: Re: More detail than I had intended on Layout description
language.
>
>
> On 2014-11-18, at 5:31 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
>
> >
> > Some aspects I'd like to highlight from Tobi's email:
> >
> > 1. LDL/NFD, by default, should describe exact memory layout on the
> > execution platform. In particular:
> > - Fields must have explicitly specified width. So, if the LDL/NFD
models a
> > C struct that contains a void*, the size of the void* has to be known
> > up-front.
> > - No automatic padding or alignment.
> > - Fields either have explicitly specified offset, or are laid out in
the
> > order of specification.
> > - Native endian.
> >
> > This implies that the descriptor for, say, a C struct, may not be
portable
> > across all platforms. If the descriptor models a data structure that is
> > shared with a native library, then the descriptor is tied to the
particular
> > native library binary. I think this is consistent with David's earlier
> > statement:
> >> So, since the Java-side behavior should be different, I think that
> >> the little language inputs
> >> (the output of the libclang-based tool) should be different
> >> depending on platform ? the
> >> offsets will need to be described in ways that make conform to
> >> Java?s expectations.
>
> I think what you are saying is consistent, but I didn?t get that
> from Tobi?s email,
> so I?ll reread it more carefully.  The language surrounding
> ?pointer? in particular
> made me think that he was approaching it differently.

Tobi's email refers to our attempt to implement a portable "pointer" type
in the LDL. It's an interesting option to consider, but I'm not sure we
want to deal with the knock-on effects of supporting this. For example, how
do we align the pointer field (in the LDL) if we don't know its size?
What's the impact on the layout nesting or extension features?

If we decide that LDL is not platform-independent, then "pointer" doesn't
seem as useful.

> Was I correct in seeing that it is proposed to attach LDL to
> interfaces with annotations?

Yes, we proposed this. However, we're certainly open to alternatives.

Since LDL is not platform-independent, if the LDLs are attached to the
generated Java interfaces, then the Java interfaces are also not
platform-independent. Is this is good or bad?

> My snap reaction is that?s a good idea, even if it does perhaps push
> the LDL back into text.
>
> And I think (based on discussions at this end) that being able to treat
these
> things as native-resident references is priority #1, but I?m not
> sure that the only
> priority.

Could you clarify the terminology "native-resident reference"?

> But I will look again.  (A bug with a deadline is currently grabbingmost
of my
> attention, unfortunately).
>
> Also, I think we might need alignment specification, especially
> considering possible
> GPU applications.  It?s not necessarily for padding purposes, but it
> is required to ensure
> that memory is properly allocated and placed ? for example, can I
> allocate a 64-bit quantity
> on an 8-bit boundary?  How am I told that I cannot do this?

For start alignment, I think you're right. Again, I would propose starting
with an explicit descriptor attribute, i.e. "the layout must start at an
alignment of x".

Angela

From david.r.chase at oracle.com  Fri Nov 21 22:34:41 2014
From: david.r.chase at oracle.com (David Chase)
Date: Fri, 21 Nov 2014 17:34:41 -0500
Subject: More detail than I had intended on Layout description language.
In-Reply-To: <OFF81978F7.E6D73E09-ON85257D95.004F54EE-85257D95.006053D7@ca.ibm.com>
References: <OF529CFDB2.5C4E927B-ON85257D8D.00508BF7-85257D8D.00508DF5@LocalDomain>
	<OF4A2468CE.623DCA66-ON85257D94.005FC26C-85257D94.006049E1@ca.ibm.com>
	<OF6E773A02.E89417C9-ON85257D94.007A0C94-85257D94.007BB024@ca.ibm.com>
	<DBE7EDB6-CF00-4796-8F60-4F432E3016A2@oracle.com>
	<OFF81978F7.E6D73E09-ON85257D95.004F54EE-85257D95.006053D7@ca.ibm.com>
Message-ID: <B2FAE0B3-79A1-47A3-B9C9-38DE6148ECF2@oracle.com>


On 2014-11-19, at 12:32 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
> Tobi's email refers to our attempt to implement a portable "pointer" type in the LDL. It's an interesting option to consider, but I'm not sure we want to deal with the knock-on effects of supporting this. For example, how do we align the pointer field (in the LDL) if we don't know its size? What's the impact on the layout nesting or extension features?
> 
> If we decide that LDL is not platform-independent, then "pointer" doesn't seem as useful.
> 
> > Was I correct in seeing that it is proposed to attach LDL to 
> > interfaces with annotations?
> 
> Yes, we proposed this. However, we're certainly open to alternatives.
> 
> Since LDL is not platform-independent, if the LDLs are attached to the generated Java interfaces, then the Java interfaces are also not platform-independent. Is this is good or bad?

Normally I?d say bad.

It might be interesting to try to see what happens with network protocols,
which are ultimately (at the byte-by-byte level) unambiguously specified.
That gets in the way of the platform-independent Java-behavior story,
because I can easily imagine that what you like to have happen in the generated
code is something like a loadInt, and on wrong-byte-order platforms, a byte-swap.
I was thinking ?network protocols? because those are something that we really
do expect to see on more than one platform.

Alternatively, maybe we need a pair of intrinsics (potentially in sized versions) called
toBigEndian() and toLittleEndian() so the generated code could always look the same
but the behavior would vary?.

Right, that will sail right past the language and corelibs guardians-of-the-faith.

I think there?s some ugliness-conservation at work here.
Perhaps we could extend the Unsafe interface to include endianness-tagged loads and stores.
(This is an application of the Alice?s Restaurant Algorithm #1,
one big pile of garbage is better than two little piles of garbage).
Another advantage of this is that some processors (Sparc, I think) have
options for swapping the data as it is loaded, so it may be expedient anyhow
to push the byteswapping as close to the loads and stores as possible.

Note that either of these options ? introducing intrinsics with targeted platform-dependent
behavior, either conditional swaps or conditionally-swapped loads/stores ? would at least
give us a prayer of saying that the same generated proxies would work across platforms
for network-protocol layouts.

Can we tag this as a ?would be nice if possible??

> > My snap reaction is that?s a good idea, even if it does perhaps push
> > the LDL back into text.
> > 
> > And I think (based on discussions at this end) that being able to treat these
> > things as native-resident references is priority #1, but I?m not 
> > sure that the only
> > priority.
> 
> Could you clarify the terminology "native-resident reference?? 

I think I mean ?what you guys want? meaning that on the Java side there
is (at minimum) an interface full of getters/setters and a generated proxy
class, which conspire to peek/poke at memory that is ?native?, meaning
not on the Java heap (i.e., allocated by malloc or mmap or similar).

This is the bare minimum for smooth copy-less interoperation with native 
code in general, I think.

For this, and for alignment, I think things get more interesting when you start
nesting things ? for example, suppose you have

struct Point {
  int x,y;
}

struct Triangle {
   Point vertices[6];
}

Considering alignment, there is the natural propagation from fields to containers;
a container can never have smaller alignment than its fields (except in the case of
some exotic addressing schemes that I don?t think are normal for C programs).

If you are references-only, then it?s all aliased.  Suppose you wanted to rotate the
vertices of the triangle, say

  Triangle tri = someNativeThing.get Triangle()
  Array<Point> points = tri(); 
  Point t = ??I need to make a copy here?? (points.get(0));
  points.get(0).set(points.get(1));
  points.get(1).set(points.get(2));
  points.get(2).set(t);

So what happens to make that copy?   I think there are (at least) 3 choices
that are not even mutually exclusive:

1) Interface includes a clone() method, proxy allocates that clone
  (a) on the Java heap, using the standard Unsafe.get/set(null,?) hack
  (b) on the native heap

2) We have ?values? that are different in that they are never aliased and not mutable.

3) It?s an interface, there?s nothing in particular stopping programmers from writing their own implementations.

1a) pro: will GC smoothly.
      con: cannot be passed to native code unless our wrappers are a little clever

1b) pro: can definitely pass to native code.
      con: there may be GC issues (will require finalizers to free temps?)

2) pro: will GC smoothly, will probably optimize better in Java, no aliasing issues.
    con: cannot pass to native code, multiplying entities.

3) pro: we get to be lazy and punt; we can be sure that we did not commit to a mistake.
    con: not like the total amount of work is saved by us being lazy and punting,
            programmers can screw this up.

There?s a slightly larger-picture question here, which is sort of ?how far does this stuff
propagate into surrounding code before it gets a wrapper slapped on it?.

> > But I will look again.  (A bug with a deadline is currently grabbingmost of my
> > attention, unfortunately).
> > 
> > Also, I think we might need alignment specification, especially 
> > considering possible
> > GPU applications.  It?s not necessarily for padding purposes, but it
> > is required to ensure
> > that memory is properly allocated and placed ? for example, can I 
> > allocate a 64-bit quantity
> > on an 8-bit boundary?  How am I told that I cannot do this?
> 
> For start alignment, I think you're right. Again, I would propose starting with an explicit descriptor attribute, i.e. "the layout must start at an alignment of x".

I think that works, but we might want to consider how alignment specs interact with everything else.
Have you guys thought much about bitfields?
There are some peculiar interactions there with alignment and endianness.

The other point I wanted to toss in, is that once you put alignment into the mix
and not just as an ?oh yeah, we need to think about alignment? it might change
how we talk about some of the other stuff.  If you look at C bitfields, there is very
much a model of ?load a box this big, then shift the data towards the low order
bits some distance, then mask/sign-smear the part we don?t care about?.  If that
bleeds through into the groveler tool, maybe it bleeds through into the little language,
too.  Some of the memory model issues demand the ?load/store/cas-a-box" model,
too ? if you treat a bitfield as something that is split across a pair of byteloads that
all gets nastier.

David