More detail than I had intended on Layout description language

Wed Feb 18 01:21:11 UTC 2015

On 2015-02-12, at 11:05 AM, Tobi Ajila <atobia at ca.ibm.com> wrote:
> Apologies for the slow response.

Same on this end, plus I am leaving Oracle, which will make my responses infinitely slow.
Nonetheless, while I am still here, I should comment.  Overall, I like the strawman, and only 
see a couple of places where there's likely to be disagreement, and I am not even sure it is
"disagree" as "want to be sure we understand why we are doing this".

> We want a memory model that is strong enough to satisfy the intuition of C programmers, but not stronger. I think we all agree that restricting the sizes of reads unnecessarily restricts the implementation.
> 
> Atomicity is useful, but should not be a default property. We've incorporated it into the following proposal.
> 
> The following is a new proposal. It contains elements from your previous proposal (some of it is copy-pasted).
> 
> Strawman proposal for layout little language:
> 
> 0. Goals: where layouts are actually invariant across platforms
>    (e.g., network protocols) we want to have just one layout specification.
> 
> 1. The LD specification must be well defined. This means that JDKs can not infer 
>    alignment or padding differently.
> 
> 2. The LD must specify the endianness of the layout. The bit and byte endian must be consistent.
>    Endian is specified at container granularity. A shorthand notation can be provided to specify endian for all containers in a layout.
> 
> 3. A field is a contiguous sequence of bits confined to a container. An update to a field may
>    overwrite contents of other fields within the same container. If the enclosing container is
>    marked as "atomic", another thread cannot observe the value of fields in the container before 
>    the update is complete. A field can not be greater than the size of the container. The sizes 
>    of fields within a container must add up to the size of the container. A field does not 
>    require a name. Accessors are only generated for named fields.

> 4. Unlike C bitfield numbering (which varies based on endianness of target
>    platform) we'll always number fields in little-endian order;
>    that is, "byte a:1, b:7" (at address x) would be extracted with the
>    expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
>    This is LE-centric, but has the nice property that bit 0 of a byte, short,
>    int, or long is extracted with the same operation (x & 1), and so on.

This is the one place I am not sure.  I can argue either side of this, but I think that
the "ease of our code generation" is one of the lesser of several dubiously compelling
reasons.

Pro:
Pushing the conventions of C compilers this far is not necessary, because
we already have a defined interface for C programmers, and it is relatively attractive,
so why should we give them another one?  For Java programmers this will be someone
intuitive and non-surprising.  In addition, little-endian is mighty-common-case.

Con:
It's not what C compilers do, and the people doing this stuff by hand are still probably
coping with C compiler conventions.  It's also not Network byte order, which might
complicate things a little for those protocols when reading them from the spec (or not --
I haven't spent that much time reading network specs to know how they do it).

That's all I've got right now, I don't find it super-convincing either way.

> 5. A container is a sequence of one or more adjacent fields. Changing a field in a container will not 
>    change fields of other containers. Container sizes must be a multiple of 8bits. There is no upper 
>    limit to the size of a container. A container does not require a name. A container can not be 
>    larger than the enclosing layout. The sizes of all the containers in a layout must add up to the 
>    size of the enclosing layout. Accessors are only generated for named containers.

Container sizes are also powers-of-two.  I think there is a tension between the maximum container
size and memory-model issues.  I would like it to be the case that the maximum container sizes
corresponds to the largest number of bits that can be written atomically; you would also like it to be
the case that the minimum container size is the smallest thing that can be updated without the possibility
of reverting concurrent stores from other threads of execution.

Note that properly getting the multi-field-single-container load/store atomicity right will be something
that implementors of this thing will need to get right, and it is something that C programmers notice.

Looking at your grammar, don't you need to be able to tag a container with a non-default alignment?

Or is it your intention that the algorithm is "default is natural alignment if possible given supplied padding,
otherwise the largest alignment that works with the padding, and ERROR if any field within the container
does not obtain its desired alignment?  Think about structs containing structs, thinking about the generation
of interior pointers/references, think about the need to know the alignment of the pointed struct when a load
is performed.

I think the C assumption is that a uint64_t* is a pointer (on a 64-bit-ish machine) that is 64-bit aligned.
If you want a pointer to a misaligned uint64_t, I think that is a different animal (or do we take the
position that we can hide that detail behind an interface?)

> 6. A layout is a is sequence of one or more adjacent containers or unions. The default alignment
>    of a layout is the size of the layout.

The alignment of a layout is the maximum alignment of the containers and unions that it comprises.
E.g., in C

struct foo {
  unint64_t a, b, c;
}

the alignment of a struct foo is 64 bits, not 192 bits.  So it follows the container, not the layout.

> Note:
> There are some flaws with the default alignment rule, should a layout with an array of 35bytes 
> have an alignment of 35btyes? Perhaps we should follow the rule used by common C compilers. 
> Something like, "default alignment is the size of the largest container in the layout rounded up 
> to 2^n bits. In the case of arrays the container element size is considered".
> 
> 7. Grammars
> for layouts:
> layoutName','size','[endianness','][alignment] //endianness (<, >) LE, BE
> ['{'
> {(containers | unions) ','}
> '}']	
> 
> for unions:
> 'U:'unionSize [unionName]
> ['{'
> {containers','}
> '}']
> 
> for containers:
> ['C:'][endianess:] (containerSize [containerName] | layoutName) {'[' numOFElements ']'}
> ['{'
> {fields','}
> '}']
> 
> for fields:
> ['F:']fieldSize [fieldName]
> 
> This notation describes fields/containers by their position relative 
> to each other and their sizes. This is hopefully less error prone than the
> size + offset technique.
> 
> -----------------------------------------------------	  
> 1) Basic example
> 
> The following is a basic structure with two fields 'x' and 'y'. 
> 
> struct A {
> uint16_t x;
> uint16_t y;
> };
> 
> This structure produces the following layout. 
> 
> Layout1:
> 
> A, <, 32 {
> C:16 x,
> C:16 y,
> }
> 
> The following is also acceptable as "C:" is optional
> 
> A, <, 32 {
> 16 x,
> 16 y,
> }
> 
> Endian independent accessors would be:
> x = LoadShortLE(base + 0)
> y = LoadShortLE(base + 2)
> ------------------------------------------------------------------------------
> 2) IP Header Example
> 
> The next example shows the layout of an IPV4Header.
> 
> Layout3:
> IPv4, >, 160 {
> C:8 {
> 4 ihl,
> 4 version,
> }
> C:8 {
> 2 ECN,
> 6 DSCP,
> }
> C:16 totLen,
> C:16 iden,
> C:16 {
> 13 fragOff,
> 3 flags, 
> }
> C:8 TTL,
> C:8 Proto,
> C:16 Checksum,
> C:32 srcAddr,
> C:32 destAddr,
> C:32 options,
> }
> --------------------------------------------------------
> 3) TCP Example
> 
> The follwing is a picture of TCP packet (bytes 12 - 14)
> | dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1 | PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 |
> 
> The corresponding layout is the following:
> 
> Layout4:
> tcp, >, 16 {
> C:16 {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> 1 ns,
> 3 rsv,
> 4 dataOffset,
> }
> }
> 
> 
> There are other ways to write 'Layout4', one could do it this way. 
> Layout5:
> tcp, >, 16 {
> c:8 {
> 1 ns ,
> 3 rsv,
> 4 dataOffset,
> }
> C:8 {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> }
> }
> 
> The memory layout in 'Layout5' is the same as 'Layout4' except that the interference rules are 
> different. Writing to fields in the first byte can not overwrite fields in the second byte.
> 
> In Layout4 and 5 the runtime implementation may overwrite other fields in the container, 
> to avoid this one can do the following
> 
> Layout6:
> tcp, >, 16 {
> C:16:atom {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> 1 ns,
> 3 rsv,
> 4 dataOffset,
> }
> }
> 
> The 'atom' attribute ensures that an update to a field does not overwrite other fields in the container. 
> A possible implementation could use CAS to do this. 

Is a compiler allowed to coalesce atomic operations within a container into a single CAS?

> -----------------------------------------------------------------------------
> 4) Implicit padding example
> 
> On a 64 bit machine the compiler would add 32 bit padding between the two fields 
> shown in the following structure.  
> 
> struct A {
> uint32_t x;
> uint64_t y;
> }

> This structure would produce the following Layout:
> 
> Layout7:
> A, 128 {
> C:32 x,
> C:32,
> C:64 y
> }
> 
> The specification of this layout does not allow the runtime implementation to interfere 
> with the 32 bit padding (bits 32 - 63), as the runtime only has access to named containers/fields. 
> In order to write to the padded area one would have to do the following. 
>  
> Layout8:
> A, 128 {
> C:64 {
> 32 x,
>    32,	
> }
> C:64 y
> }
> 
> ----------------------------------------------------------------------
> 5) Packed struct example
> 
> The following example displays a packed struct.
> 
> //using gcc compiler attributes
> struct __attribute__ ((__packed__)) PackedStruct{
> uint8_t a;
> uint32_t b; //unaligned 32 bit value 
> uint8_t c;
> };
> 
> This struct produces the following layout:
> 
> Layout9:
> PackedStruct, 48 {
> C:8 a,
> C:32 b,
> C:8 c,
> }
> 
> The runtime could choose to implement access to b using (not an exclusive list):
> - unaligned 32bit load/store
> - multiple 8bit loads/stores (tearing)
> - CaS of 64bits spanning a, b, and c
> 
> If b were also atomic, then CaS or a lock may be needed to access it.
> 
> ------------------------------------------------------------------------
> 6) Union example 
> 
> The following example shows how unions can be described in a Layout. 
> 
> struct A {
> uint16_t x;
> uint16_t y;
> };
> 
> union C {
> struct A a;
> uint32_t b;
> }	
> 
> The union above can be described in a Layout as:
> 
> Layout10:
> C, 32 {
> U:32 {
> C:32 {
> 16 x,
> 16 y,
> },
> 32 b,
> }
> }
> 
> There is a possibility that field names may conflict with one another. For example if the structures were 
> renamed in the following manner:
> 
> struct A {
> uint16_t a;
> uint16_t b;
> };
> 
> union C {
> struct A a;
> uint32_t b;
> }
> 
> It is valid to do this in C but it can't be described using the layout scheme above. To solve this we need 
> to use nested fields. A nested layout can be specified by simply replacing 
> the container size attribute with the name of the layout. This requires that we have two layouts, one for the union and one for the nested structure.
> 
> Layout11:
> A, 32 {
> C:16 x,
> C:16 y,
> }
> C, 32 {
> U:32 {
> C:A a,  //<--- nested Layout
> C:32 b,
> }
> }
> 
> -------------------------------------------------------
> 7) Array Example
> 
> The following example shows how a structure of arrays can be described in a layout.
> 
> struct SOA {
> uint8_t a[10];
> uint16_t b[10][10]; //2-d array
> };
> 
> Layout16:
> SOA, 210 {
> 8[10] a,
> 16[10][10] b,
> }
> 
> --------------------------------------------------
> 8) Named container named field example
> 
> In this example there is a named container which encloses named fields. 
> 
> A, 64 {
> 64 abcd {
> 16 a,
> 16 b,
> 16 c,
> 16 d,
> }
> }
> 
> An accessor is generated for each field as well as the container. The 
> container accessor returns all the fields as a single 64bit value.
>