From atobia at ca.ibm.com  Thu Feb 12 16:05:51 2015
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Thu, 12 Feb 2015 11:05:51 -0500
Subject: More detail than I had intended on Layout description language
In-Reply-To: <C17937D6-F503-481C-BEAC-B85C26CFA5CD@oracle.com>
References: <OFE44D0070.1B9A951D-ON85257DA2.0065FDC6-85257DA2.00662FC5@ca.ibm.com>
	<11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com>
	<OF9C544D53.94DCF279-ON85257DB0.007B99B5-85257DB1.0054D65D@ca.ibm.com>
	<438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com>
	<OF41019522.3741C4F7-ON85257DD5.007608CC-85257DD5.00761A5D@ca.ibm.com>
	<C17937D6-F503-481C-BEAC-B85C26CFA5CD@oracle.com>
Message-ID: <OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>


Hi David

Apologies for the slow response.

>On float./double I am not sure -- on the one hand you are right that
between
>possibly baroque bitswaps on the incoming integers and longs and existing
>floatTo and toFloat methods, we could do this, on the other hand there's
going
>to be some hope of making things very efficient and that might be added by
>including a primitive (else we put ourselves into the pattern-matching
intrinsic
>substitution business).  Probably I need to attempt to concoct a use case,
and
>before I go very far into it, I can say that what I imagine involves
shared memory
>and a pair of processors sharing that memory, with mixed endianness
between
>them.  Do we care?
The LDL specification should be able to support mixed endianness as it can
indicate the endianness of particular field.  Having said that, we don't
object to float/double primitives on Unsafe.

>Do GPUs enter into the float issue at all, or do they use processor
formats?
>From my research popular GPUs such as Nvidia adhere to the IEEE754 standard
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf

>No, because (I am pretty sure that) C compilers do not do that either if
they can get away
>with byte-at-a-time loads and stores.  Just checked, clang really goes to
town on the optimized
>stores.  *but I have not checked what happens if the bitfields are
volatile*  -- and I just did,
>and it changes the behavior, how about that?
>Well this is annoying.  How far do we want to go down the rathole of
mimicking C behavior
>*and* performance?

We want a memory model that is strong enough to satisfy the intuition of C
programmers, but not stronger. I think we all agree that restricting the
sizes of reads unnecessarily restricts the implementation.

Atomicity is useful, but should not be a default property. We've
incorporated it into the following proposal.

The following is a new proposal. It contains elements from your previous
proposal (some of it is copy-pasted).

Strawman proposal for layout little language:

0. Goals: where layouts are actually invariant across platforms
   (e.g., network protocols) we want to have just one layout specification.

1. The LD specification must be well defined. This means that JDKs can not
infer
   alignment or padding differently.

2. The LD must specify the endianness of the layout. The bit and byte
endian must be consistent.
   Endian is specified at container granularity. A shorthand notation can
be provided to specify endian for all containers in a layout.

3. A field is a contiguous sequence of bits confined to a container. An
update to a field may
   overwrite contents of other fields within the same container. If the
enclosing container is
   marked as "atomic", another thread cannot observe the value of fields in
the container before
   the update is complete. A field can not be greater than the size of the
container. The sizes
   of fields within a container must add up to the size of the container. A
field does not
   require a name. Accessors are only generated for named fields.

4. Unlike C bitfield numbering (which varies based on endianness of target
   platform) we'll always number fields in little-endian order;
   that is, "byte a:1, b:7" (at address x) would be extracted with the
   expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
   This is LE-centric, but has the nice property that bit 0 of a byte,
short,
   int, or long is extracted with the same operation (x & 1), and so on.

5. A container is a sequence of one or more adjacent fields. Changing a
field in a container will not
   change fields of other containers. Container sizes must be a multiple of
8bits. There is no upper
   limit to the size of a container. A container does not require a name. A
container can not be
   larger than the enclosing layout. The sizes of all the containers in a
layout must add up to the
   size of the enclosing layout. Accessors are only generated for named
containers.

6. A layout is a is sequence of one or more adjacent containers or unions.
The default alignment
   of a layout is the size of the layout.

Note:
There are some flaws with the default alignment rule, should a layout with
an array of 35bytes
have an alignment of 35btyes? Perhaps we should follow the rule used by
common C compilers.
Something like, "default alignment is the size of the largest container in
the layout rounded up
to 2^n bits. In the case of arrays the container element size is
considered".

7. Grammars
	for layouts:
	layoutName','size','[endianness','][alignment] //endianness (<, >)
LE, BE
	['{'
		{(containers | unions) ','}
	'}']

	for unions:
	'U:'unionSize [unionName]
	['{'
		{containers','}
	'}']

	for containers:
	['C:'][endianess:] (containerSize [containerName] | layoutName) {'['
numOFElements ']'}
	['{'
		{fields','}
	'}']

	for fields:
	['F:']fieldSize [fieldName]

This notation describes fields/containers by their position relative
to each other and their sizes. This is hopefully less error prone than the
size + offset technique.

-----------------------------------------------------
1) Basic example

The following is a basic structure with two fields 'x' and 'y'.

struct A {
	uint16_t x;
	uint16_t y;
};

This structure produces the following layout.

Layout1:

A, <, 32 {
	C:16 x,
	C:16 y,
}

The following is also acceptable as "C:" is optional

A, <, 32 {
	16 x,
	16 y,
}

Endian independent accessors would be:
x = LoadShortLE(base + 0)
y = LoadShortLE(base + 2)
------------------------------------------------------------------------------
2) IP Header Example

The next example shows the layout of an IPV4Header.

Layout3:
IPv4, >, 160 {
	C:8 {
		4 ihl,
		4 version,
	}
	C:8 {
		2 ECN,
		6 DSCP,
	}
	C:16 totLen,
	C:16 iden,
	C:16 {
		13 fragOff,
		3 flags,
	}
	C:8 TTL,
	C:8 Proto,
	C:16 Checksum,
	C:32 srcAddr,
	C:32 destAddr,
	C:32 options,
}
--------------------------------------------------------
3) TCP Example

The follwing is a picture of TCP packet (bytes 12 - 14)
| dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1
| PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 |

The corresponding layout is the following:

Layout4:
tcp, >, 16 {
	C:16 {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
		1 ns,
		3 rsv,
		4 dataOffset,
	}
}


There are other ways to write 'Layout4', one could do it this way.
Layout5:
tcp, >, 16 {
	c:8 {
		1 ns ,
		3 rsv,
		4 dataOffset,
	}
	C:8 {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
	}
}

The memory layout in 'Layout5' is the same as 'Layout4' except that the
interference rules are
different. Writing to fields in the first byte can not overwrite fields in
the second byte.

In Layout4 and 5 the runtime implementation may overwrite other fields in
the container,
to avoid this one can do the following

Layout6:
tcp, >, 16 {
	C:16:atom {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
		1 ns,
		3 rsv,
		4 dataOffset,
	}
}

The 'atom' attribute ensures that an update to a field does not overwrite
other fields in the container.
A possible implementation could use CAS to do this.

-----------------------------------------------------------------------------
4) Implicit padding example

On a 64 bit machine the compiler would add 32 bit padding between the two
fields
shown in the following structure.

struct A {
	uint32_t x;
	uint64_t y;
}

This structure would produce the following Layout:

Layout7:
A, 128 {
	C:32 x,
	C:32,
	C:64 y
}

The specification of this layout does not allow the runtime implementation
to interfere
with the 32 bit padding (bits 32 - 63), as the runtime only has access to
named containers/fields.
In order to write to the padded area one would have to do the following.

Layout8:
A, 128 {
	C:64 {
		32 x,
	    32,
	}
	C:64 y
}

----------------------------------------------------------------------
5) Packed struct example

The following example displays a packed struct.

//using gcc compiler attributes
struct __attribute__ ((__packed__)) PackedStruct{
	uint8_t a;
	uint32_t b; //unaligned 32 bit value
	uint8_t c;
};

This struct produces the following layout:

Layout9:
PackedStruct, 48 {
	C:8 a,
	C:32 b,
	C:8 c,
}

The runtime could choose to implement access to b using (not an exclusive
list):
- unaligned 32bit load/store
- multiple 8bit loads/stores (tearing)
- CaS of 64bits spanning a, b, and c

If b were also atomic, then CaS or a lock may be needed to access it.

------------------------------------------------------------------------
6) Union example

The following example shows how unions can be described in a Layout.

struct A {
	uint16_t x;
	uint16_t y;
};

union C {
	struct A a;
	uint32_t b;
}

The union above can be described in a Layout as:

Layout10:
C, 32 {
	U:32 {
		C:32 {
			16 x,
			16 y,
		},
		32 b,
	}
}

There is a possibility that field names may conflict with one another. For
example if the structures were
renamed in the following manner:

struct A {
	uint16_t a;
	uint16_t b;
};

union C {
	struct A a;
	uint32_t b;
}

It is valid to do this in C but it can't be described using the layout
scheme above. To solve this we need
to use nested fields. A nested layout can be specified by simply replacing
the container size attribute with the name of the layout. This requires
that we have two layouts, one for the union and one for the nested
structure.

Layout11:
A, 32 {
	C:16 x,
	C:16 y,
}
C, 32 {
	U:32 {
		C:A a,  //<--- nested Layout
		C:32 b,
	}
}

-------------------------------------------------------
7) Array Example

The following example shows how a structure of arrays can be described in a
layout.

struct SOA {
	uint8_t a[10];
	uint16_t b[10][10]; //2-d array
};

Layout16:
SOA, 210 {
	8[10] a,
	16[10][10] b,
}

--------------------------------------------------
8) Named container named field example

In this example there is a named container which encloses named fields.

A, 64 {
	64 abcd {
		16 a,
		16 b,
		16 c,
		16 d,
	}
}

An accessor is generated for each field as well as the container. The
container accessor returns all the fields as a single 64bit value.

From angela_lin at ca.ibm.com  Fri Feb 13 14:56:34 2015
From: angela_lin at ca.ibm.com (Angela Lin)
Date: Fri, 13 Feb 2015 09:56:34 -0500
Subject: More detail than I had intended on Layout description language
In-Reply-To: <OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>
References: <OFE44D0070.1B9A951D-ON85257DA2.0065FDC6-85257DA2.00662FC5@ca.ibm.com>	<11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com>
	<OF9C544D53.94DCF279-ON85257DB0.007B99B5-85257DB1.0054D65D@ca.ibm.com>	<438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com>
	<OF41019522.3741C4F7-ON85257DD5.007608CC-85257DD5.00761A5D@ca.ibm.com>	<C17937D6-F503-481C-BEAC-B85C26CFA5CD@oracle.com>
	<OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>
Message-ID: <OFCDC21EBA.CE7BFCE4-ON85257DEB.00443B41-85257DEB.005215A2@ca.ibm.com>


"panama-spec-experts" <panama-spec-experts-bounces at openjdk.java.net> wrote
on 02/12/2015 11:05:51 AM:

> From: Tobi Ajila/Ottawa/IBM at IBMCA
> To: David Chase <david.r.chase at oracle.com>
> Cc: Doug Lea <dl at cs.oswego.edu>, panama-spec-experts at openjdk.java.net
> Date: 02/12/2015 11:07 AM
> Subject: Re: More detail than I had intended on Layout description
language
> Sent by: "panama-spec-experts"
<panama-spec-experts-bounces at openjdk.java.net>
>
>
> >Do GPUs enter into the float issue at all, or do they use processor
> formats?
> From my research popular GPUs such as Nvidia adhere to the IEEE754
standard
> https://developer.nvidia.com/sites/default/files/akamai/cuda/files/
> NVIDIA-CUDA-Floating-Point.pdf
>

The IEEE754 standard allows leeway for implementations to omit certain
features. For example, the extended and extendable precision formats may or
may not be implemented. I think IEEE754 also allows the results of a
sequence of operations to differ depending on how a compiler optimizes
them. So the result of a floating point computation in the VM might differ
from the result from a GPU native library. This suggests that we should
discourage performing fp operations interchangeably across Java (on host
CPU) and GPU.

OpenCL seems to have a non-IEEE754-compliant mode of operation.

From david.r.chase at oracle.com  Wed Feb 18 01:21:11 2015
From: david.r.chase at oracle.com (David Chase)
Date: Tue, 17 Feb 2015 20:21:11 -0500
Subject: More detail than I had intended on Layout description language
In-Reply-To: <OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>
References: <OFE44D0070.1B9A951D-ON85257DA2.0065FDC6-85257DA2.00662FC5@ca.ibm.com>
	<11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com>
	<OF9C544D53.94DCF279-ON85257DB0.007B99B5-85257DB1.0054D65D@ca.ibm.com>
	<438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com>
	<OF41019522.3741C4F7-ON85257DD5.007608CC-85257DD5.00761A5D@ca.ibm.com>
	<C17937D6-F503-481C-BEAC-B85C26CFA5CD@oracle.com>
	<OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>
Message-ID: <865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com>


On 2015-02-12, at 11:05 AM, Tobi Ajila <atobia at ca.ibm.com> wrote:
> Apologies for the slow response.

Same on this end, plus I am leaving Oracle, which will make my responses infinitely slow.
Nonetheless, while I am still here, I should comment.  Overall, I like the strawman, and only 
see a couple of places where there's likely to be disagreement, and I am not even sure it is
"disagree" as "want to be sure we understand why we are doing this".

> We want a memory model that is strong enough to satisfy the intuition of C programmers, but not stronger. I think we all agree that restricting the sizes of reads unnecessarily restricts the implementation.
> 
> Atomicity is useful, but should not be a default property. We've incorporated it into the following proposal.
> 
> The following is a new proposal. It contains elements from your previous proposal (some of it is copy-pasted).
> 
> Strawman proposal for layout little language:
> 
> 0. Goals: where layouts are actually invariant across platforms
>    (e.g., network protocols) we want to have just one layout specification.
> 
> 1. The LD specification must be well defined. This means that JDKs can not infer 
>    alignment or padding differently.
> 
> 2. The LD must specify the endianness of the layout. The bit and byte endian must be consistent.
>    Endian is specified at container granularity. A shorthand notation can be provided to specify endian for all containers in a layout.
> 
> 3. A field is a contiguous sequence of bits confined to a container. An update to a field may
>    overwrite contents of other fields within the same container. If the enclosing container is
>    marked as "atomic", another thread cannot observe the value of fields in the container before 
>    the update is complete. A field can not be greater than the size of the container. The sizes 
>    of fields within a container must add up to the size of the container. A field does not 
>    require a name. Accessors are only generated for named fields.


> 4. Unlike C bitfield numbering (which varies based on endianness of target
>    platform) we'll always number fields in little-endian order;
>    that is, "byte a:1, b:7" (at address x) would be extracted with the
>    expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
>    This is LE-centric, but has the nice property that bit 0 of a byte, short,
>    int, or long is extracted with the same operation (x & 1), and so on.

This is the one place I am not sure.  I can argue either side of this, but I think that
the "ease of our code generation" is one of the lesser of several dubiously compelling
reasons.

Pro:
Pushing the conventions of C compilers this far is not necessary, because
we already have a defined interface for C programmers, and it is relatively attractive,
so why should we give them another one?  For Java programmers this will be someone
intuitive and non-surprising.  In addition, little-endian is mighty-common-case.

Con:
It's not what C compilers do, and the people doing this stuff by hand are still probably
coping with C compiler conventions.  It's also not Network byte order, which might
complicate things a little for those protocols when reading them from the spec (or not --
I haven't spent that much time reading network specs to know how they do it).

That's all I've got right now, I don't find it super-convincing either way.

> 5. A container is a sequence of one or more adjacent fields. Changing a field in a container will not 
>    change fields of other containers. Container sizes must be a multiple of 8bits. There is no upper 
>    limit to the size of a container. A container does not require a name. A container can not be 
>    larger than the enclosing layout. The sizes of all the containers in a layout must add up to the 
>    size of the enclosing layout. Accessors are only generated for named containers.

Container sizes are also powers-of-two.  I think there is a tension between the maximum container
size and memory-model issues.  I would like it to be the case that the maximum container sizes
corresponds to the largest number of bits that can be written atomically; you would also like it to be
the case that the minimum container size is the smallest thing that can be updated without the possibility
of reverting concurrent stores from other threads of execution.

Note that properly getting the multi-field-single-container load/store atomicity right will be something
that implementors of this thing will need to get right, and it is something that C programmers notice.

Looking at your grammar, don't you need to be able to tag a container with a non-default alignment?

Or is it your intention that the algorithm is "default is natural alignment if possible given supplied padding,
otherwise the largest alignment that works with the padding, and ERROR if any field within the container
does not obtain its desired alignment?  Think about structs containing structs, thinking about the generation
of interior pointers/references, think about the need to know the alignment of the pointed struct when a load
is performed.

I think the C assumption is that a uint64_t* is a pointer (on a 64-bit-ish machine) that is 64-bit aligned.
If you want a pointer to a misaligned uint64_t, I think that is a different animal (or do we take the
position that we can hide that detail behind an interface?)


> 6. A layout is a is sequence of one or more adjacent containers or unions. The default alignment
>    of a layout is the size of the layout.

The alignment of a layout is the maximum alignment of the containers and unions that it comprises.
E.g., in C

struct foo {
  unint64_t a, b, c;
}

the alignment of a struct foo is 64 bits, not 192 bits.  So it follows the container, not the layout.

> Note:
> There are some flaws with the default alignment rule, should a layout with an array of 35bytes 
> have an alignment of 35btyes? Perhaps we should follow the rule used by common C compilers. 
> Something like, "default alignment is the size of the largest container in the layout rounded up 
> to 2^n bits. In the case of arrays the container element size is considered".
> 
> 7. Grammars
> for layouts:
> layoutName','size','[endianness','][alignment] //endianness (<, >) LE, BE
> ['{'
> {(containers | unions) ','}
> '}']	
> 
> for unions:
> 'U:'unionSize [unionName]
> ['{'
> {containers','}
> '}']
> 
> for containers:
> ['C:'][endianess:] (containerSize [containerName] | layoutName) {'[' numOFElements ']'}
> ['{'
> {fields','}
> '}']
> 
> for fields:
> ['F:']fieldSize [fieldName]
> 
> This notation describes fields/containers by their position relative 
> to each other and their sizes. This is hopefully less error prone than the
> size + offset technique.
> 
> -----------------------------------------------------	  
> 1) Basic example
> 
> The following is a basic structure with two fields 'x' and 'y'. 
> 
> struct A {
> uint16_t x;
> uint16_t y;
> };
> 
> This structure produces the following layout. 
> 
> Layout1:
> 
> A, <, 32 {
> C:16 x,
> C:16 y,
> }
> 
> The following is also acceptable as "C:" is optional
> 
> A, <, 32 {
> 16 x,
> 16 y,
> }
> 
> Endian independent accessors would be:
> x = LoadShortLE(base + 0)
> y = LoadShortLE(base + 2)
> ------------------------------------------------------------------------------
> 2) IP Header Example
> 
> The next example shows the layout of an IPV4Header.
> 
> Layout3:
> IPv4, >, 160 {
> C:8 {
> 4 ihl,
> 4 version,
> }
> C:8 {
> 2 ECN,
> 6 DSCP,
> }
> C:16 totLen,
> C:16 iden,
> C:16 {
> 13 fragOff,
> 3 flags, 
> }
> C:8 TTL,
> C:8 Proto,
> C:16 Checksum,
> C:32 srcAddr,
> C:32 destAddr,
> C:32 options,
> }
> --------------------------------------------------------
> 3) TCP Example
> 
> The follwing is a picture of TCP packet (bytes 12 - 14)
> | dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1 | PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 |
> 
> The corresponding layout is the following:
> 
> Layout4:
> tcp, >, 16 {
> C:16 {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> 1 ns,
> 3 rsv,
> 4 dataOffset,
> }
> }
> 
> 
> There are other ways to write 'Layout4', one could do it this way. 
> Layout5:
> tcp, >, 16 {
> c:8 {
> 1 ns ,
> 3 rsv,
> 4 dataOffset,
> }
> C:8 {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> }
> }
> 
> The memory layout in 'Layout5' is the same as 'Layout4' except that the interference rules are 
> different. Writing to fields in the first byte can not overwrite fields in the second byte.
> 
> In Layout4 and 5 the runtime implementation may overwrite other fields in the container, 
> to avoid this one can do the following
> 
> Layout6:
> tcp, >, 16 {
> C:16:atom {
> 1 fin,
> 1 syn,
> 1 rst,
> 1 psh,
> 1 ack,
> 1 urg,
> 1 ece,
> 1 cwr,
> 1 ns,
> 3 rsv,
> 4 dataOffset,
> }
> }
> 
> The 'atom' attribute ensures that an update to a field does not overwrite other fields in the container. 
> A possible implementation could use CAS to do this. 

Is a compiler allowed to coalesce atomic operations within a container into a single CAS?

> -----------------------------------------------------------------------------
> 4) Implicit padding example
> 
> On a 64 bit machine the compiler would add 32 bit padding between the two fields 
> shown in the following structure.  
> 
> struct A {
> uint32_t x;
> uint64_t y;
> }

> This structure would produce the following Layout:
> 
> Layout7:
> A, 128 {
> C:32 x,
> C:32,
> C:64 y
> }
> 
> The specification of this layout does not allow the runtime implementation to interfere 
> with the 32 bit padding (bits 32 - 63), as the runtime only has access to named containers/fields. 
> In order to write to the padded area one would have to do the following. 
>  
> Layout8:
> A, 128 {
> C:64 {
> 32 x,
>    32,	
> }
> C:64 y
> }
> 
> ----------------------------------------------------------------------
> 5) Packed struct example
> 
> The following example displays a packed struct.
> 
> //using gcc compiler attributes
> struct __attribute__ ((__packed__)) PackedStruct{
> uint8_t a;
> uint32_t b; //unaligned 32 bit value 
> uint8_t c;
> };
> 
> This struct produces the following layout:
> 
> Layout9:
> PackedStruct, 48 {
> C:8 a,
> C:32 b,
> C:8 c,
> }
> 
> The runtime could choose to implement access to b using (not an exclusive list):
> - unaligned 32bit load/store
> - multiple 8bit loads/stores (tearing)
> - CaS of 64bits spanning a, b, and c
> 
> If b were also atomic, then CaS or a lock may be needed to access it.
> 
> ------------------------------------------------------------------------
> 6) Union example 
> 
> The following example shows how unions can be described in a Layout. 
> 
> struct A {
> uint16_t x;
> uint16_t y;
> };
> 
> union C {
> struct A a;
> uint32_t b;
> }	
> 
> The union above can be described in a Layout as:
> 
> Layout10:
> C, 32 {
> U:32 {
> C:32 {
> 16 x,
> 16 y,
> },
> 32 b,
> }
> }
> 
> There is a possibility that field names may conflict with one another. For example if the structures were 
> renamed in the following manner:
> 
> struct A {
> uint16_t a;
> uint16_t b;
> };
> 
> union C {
> struct A a;
> uint32_t b;
> }
> 
> It is valid to do this in C but it can't be described using the layout scheme above. To solve this we need 
> to use nested fields. A nested layout can be specified by simply replacing 
> the container size attribute with the name of the layout. This requires that we have two layouts, one for the union and one for the nested structure.
> 
> Layout11:
> A, 32 {
> C:16 x,
> C:16 y,
> }
> C, 32 {
> U:32 {
> C:A a,  //<--- nested Layout
> C:32 b,
> }
> }
> 
> -------------------------------------------------------
> 7) Array Example
> 
> The following example shows how a structure of arrays can be described in a layout.
> 
> struct SOA {
> uint8_t a[10];
> uint16_t b[10][10]; //2-d array
> };
> 
> Layout16:
> SOA, 210 {
> 8[10] a,
> 16[10][10] b,
> }
> 
> --------------------------------------------------
> 8) Named container named field example
> 
> In this example there is a named container which encloses named fields. 
> 
> A, 64 {
> 64 abcd {
> 16 a,
> 16 b,
> 16 c,
> 16 d,
> }
> }
> 
> An accessor is generated for each field as well as the container. The 
> container accessor returns all the fields as a single 64bit value.
> 


From atobia at ca.ibm.com  Fri Feb 27 21:22:36 2015
From: atobia at ca.ibm.com (Tobi Ajila)
Date: Fri, 27 Feb 2015 16:22:36 -0500
Subject: More detail than I had intended on Layout description language
In-Reply-To: <865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com>
References: <OFE44D0070.1B9A951D-ON85257DA2.0065FDC6-85257DA2.00662FC5@ca.ibm.com>
	<11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com>
	<OF9C544D53.94DCF279-ON85257DB0.007B99B5-85257DB1.0054D65D@ca.ibm.com>
	<438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com>
	<OF41019522.3741C4F7-ON85257DD5.007608CC-85257DD5.00761A5D@ca.ibm.com>
	<C17937D6-F503-481C-BEAC-B85C26CFA5CD@oracle.com>
	<OF9B88E717.8CAA0C18-ON85257DEA.00575FA7-85257DEA.00586D41@ca.ibm.com>
	<865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com>
Message-ID: <OFD77047D9.0B3E6579-ON85257DF9.007527E7-85257DF9.00756D08@ca.ibm.com>

>> 4. Unlike C bitfield numbering (which varies based on endianness of
target
>>    platform) we'll always number fields in little-endian order;
>>    that is, "byte a:1, b:7" (at address x) would be extracted with the
>>    expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
>>    This is LE-centric, but has the nice property that bit 0 of a byte,
short,
>>    int, or long is extracted with the same operation (x & 1), and so on.

>This is the one place I am not sure.  I can argue either side of this, but
I think that
>the "ease of our code generation" is one of the lesser of several
dubiously compelling
>reasons.
...
>That's all I've got right now, I don't find it super-convincing either
way.

For now, let's stick with LE numbering since it is sufficient for common
cases.

> Container sizes are also powers-of-two.  I think there is a tension
between the maximum container
> size and memory-model issues.  I would like it to be the case that the
maximum container sizes
> corresponds to the largest number of bits that can be written atomically;
you would also like it to be
> the case that the minimum container size is the smallest thing that can
be updated without the possibility
> of reverting concurrent stores from other threads of execution.

We are not in favour of a max container size because we want container
sizes to be forward compatible with
future hardware. We can always revert to doing something slow (e.g. using a
lock) in order to achieve atomicity
in cases where containers are larger than the largest register size.

We must be able to describe misaligned containers because we must be able
to describe existing data
structures that have misaligned fields.  If we can describe misaligned
containers, we can also describe
odd sized containers (as long as they are a multiple of 8-bits). The
implementation will be able to
deal with both of these cases.

> Looking at your grammar, don't you need to be able to tag a container
with a non-default alignment?
>
> Or is it your intention that the algorithm is "default is natural
alignment if possible given supplied padding,
> otherwise the largest alignment that works with the padding, and ERROR if
any field within the container
> does not obtain its desired alignment?

My intention here is that the LDL spec only describes the size and location
of fields. Any alignment is specified
with explicit padding. We think the LDL spec should not enforce a container
alignment, but only represent it.

> I think the C assumption is that a uint64_t* is a pointer (on a
64-bit-ish machine) that is 64-bit aligned.
> If you want a pointer to a misaligned uint64_t, I think that is a
different animal (or do we take the
> position that we can hide that detail behind an interface?)

We want to be able to specify both of these:

on 64 bit:
struct Node {
     int8_t data;
     struct Node* next;
}

LDL:
Node, 128 {
    8 data,
    56 ,
    64 next,
}

struct __attribute__ ((__packed__)) Node {
    int8_t data;
    struct Node* next;
}
LDL:
Node, 72 {
    8 data,
    64 next,
}

With the use of locks, we can support structures with misaligned
containers.

> Think about structs containing structs, thinking about the generation
> of interior pointers/references, think about the need to know the
alignment of the pointed struct when a load
> is performed.
That's a good point

struct A {
    int32_t x;
    int32_t y;
}

LDL:
A, 32 { //default alignment of 32
    32 x,
    32 y,
}

struct __attribute__ ((__packed__)) B {
    int8_t z;
    struct A a;
}
LDL:

B, 72 {
    8 z,
    A a, //this is aligned to 8 not 32
}

Can I get 'a' and treat it like it's aligned? With the exception of atomics
this is not a big problem. Most
modern hardware allow you to access misaligned memory with some performance
penalties. The nested struct is
not the only case where this problem could occur. What if we point a
perfectly aligned Layout to an unaligned
memory location (like casting unaligned memory to a struct)?

We think there are 3 answers to this:
1) Ignore it - Use C behaviour as the standard
2) Don't allow it - Perform runtime checks for misaligned access and throw
an exception
3) Handle it safely - Provide two accessors, the regular one, and one that
can safely handled misaligned access

>> The 'atom' attribute ensures that an update to a field does not
overwrite other fields in the container.
>> A possible implementation could use CAS to do this.
>
> Is a compiler allowed to coalesce atomic operations within a container
into a single CAS?
Yes, I don't see a reason not to allow this.