More detail than I had intended on Layout description language

Thu Feb 12 16:05:51 UTC 2015

Hi David

Apologies for the slow response.

>On float./double I am not sure -- on the one hand you are right that
between
>possibly baroque bitswaps on the incoming integers and longs and existing
>floatTo and toFloat methods, we could do this, on the other hand there's
going
>to be some hope of making things very efficient and that might be added by
>including a primitive (else we put ourselves into the pattern-matching
intrinsic
>substitution business).  Probably I need to attempt to concoct a use case,
and
>before I go very far into it, I can say that what I imagine involves
shared memory
>and a pair of processors sharing that memory, with mixed endianness
between
>them.  Do we care?
The LDL specification should be able to support mixed endianness as it can
indicate the endianness of particular field.  Having said that, we don't
object to float/double primitives on Unsafe.

>Do GPUs enter into the float issue at all, or do they use processor
formats?
>From my research popular GPUs such as Nvidia adhere to the IEEE754 standard
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf

>No, because (I am pretty sure that) C compilers do not do that either if
they can get away
>with byte-at-a-time loads and stores.  Just checked, clang really goes to
town on the optimized
>stores.  *but I have not checked what happens if the bitfields are
volatile*  -- and I just did,
>and it changes the behavior, how about that?
>Well this is annoying.  How far do we want to go down the rathole of
mimicking C behavior
>*and* performance?

We want a memory model that is strong enough to satisfy the intuition of C
programmers, but not stronger. I think we all agree that restricting the
sizes of reads unnecessarily restricts the implementation.

Atomicity is useful, but should not be a default property. We've
incorporated it into the following proposal.

The following is a new proposal. It contains elements from your previous
proposal (some of it is copy-pasted).

Strawman proposal for layout little language:

0. Goals: where layouts are actually invariant across platforms
   (e.g., network protocols) we want to have just one layout specification.

1. The LD specification must be well defined. This means that JDKs can not
infer
   alignment or padding differently.

2. The LD must specify the endianness of the layout. The bit and byte
endian must be consistent.
   Endian is specified at container granularity. A shorthand notation can
be provided to specify endian for all containers in a layout.

3. A field is a contiguous sequence of bits confined to a container. An
update to a field may
   overwrite contents of other fields within the same container. If the
enclosing container is
   marked as "atomic", another thread cannot observe the value of fields in
the container before
   the update is complete. A field can not be greater than the size of the
container. The sizes
   of fields within a container must add up to the size of the container. A
field does not
   require a name. Accessors are only generated for named fields.

4. Unlike C bitfield numbering (which varies based on endianness of target
   platform) we'll always number fields in little-endian order;
   that is, "byte a:1, b:7" (at address x) would be extracted with the
   expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
   This is LE-centric, but has the nice property that bit 0 of a byte,
short,
   int, or long is extracted with the same operation (x & 1), and so on.

5. A container is a sequence of one or more adjacent fields. Changing a
field in a container will not
   change fields of other containers. Container sizes must be a multiple of
8bits. There is no upper
   limit to the size of a container. A container does not require a name. A
container can not be
   larger than the enclosing layout. The sizes of all the containers in a
layout must add up to the
   size of the enclosing layout. Accessors are only generated for named
containers.

6. A layout is a is sequence of one or more adjacent containers or unions.
The default alignment
   of a layout is the size of the layout.

Note:
There are some flaws with the default alignment rule, should a layout with
an array of 35bytes
have an alignment of 35btyes? Perhaps we should follow the rule used by
common C compilers.
Something like, "default alignment is the size of the largest container in
the layout rounded up
to 2^n bits. In the case of arrays the container element size is
considered".

7. Grammars
	for layouts:
	layoutName','size','[endianness','][alignment] //endianness (<, >)
LE, BE
	['{'
		{(containers | unions) ','}
	'}']

	for unions:
	'U:'unionSize [unionName]
	['{'
		{containers','}
	'}']

	for containers:
	['C:'][endianess:] (containerSize [containerName] | layoutName) {'['
numOFElements ']'}
	['{'
		{fields','}
	'}']

	for fields:
	['F:']fieldSize [fieldName]

This notation describes fields/containers by their position relative
to each other and their sizes. This is hopefully less error prone than the
size + offset technique.

-----------------------------------------------------
1) Basic example

The following is a basic structure with two fields 'x' and 'y'.

struct A {
	uint16_t x;
	uint16_t y;
};

This structure produces the following layout.

Layout1:

A, <, 32 {
	C:16 x,
	C:16 y,
}

The following is also acceptable as "C:" is optional

A, <, 32 {
	16 x,
	16 y,
}

Endian independent accessors would be:
x = LoadShortLE(base + 0)
y = LoadShortLE(base + 2)
------------------------------------------------------------------------------
2) IP Header Example

The next example shows the layout of an IPV4Header.

Layout3:
IPv4, >, 160 {
	C:8 {
		4 ihl,
		4 version,
	}
	C:8 {
		2 ECN,
		6 DSCP,
	}
	C:16 totLen,
	C:16 iden,
	C:16 {
		13 fragOff,
		3 flags,
	}
	C:8 TTL,
	C:8 Proto,
	C:16 Checksum,
	C:32 srcAddr,
	C:32 destAddr,
	C:32 options,
}
--------------------------------------------------------
3) TCP Example

The follwing is a picture of TCP packet (bytes 12 - 14)
| dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1
| PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 |

The corresponding layout is the following:

Layout4:
tcp, >, 16 {
	C:16 {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
		1 ns,
		3 rsv,
		4 dataOffset,
	}
}

There are other ways to write 'Layout4', one could do it this way.
Layout5:
tcp, >, 16 {
	c:8 {
		1 ns ,
		3 rsv,
		4 dataOffset,
	}
	C:8 {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
	}
}

The memory layout in 'Layout5' is the same as 'Layout4' except that the
interference rules are
different. Writing to fields in the first byte can not overwrite fields in
the second byte.

In Layout4 and 5 the runtime implementation may overwrite other fields in
the container,
to avoid this one can do the following

Layout6:
tcp, >, 16 {
	C:16:atom {
		1 fin,
		1 syn,
		1 rst,
		1 psh,
		1 ack,
		1 urg,
		1 ece,
		1 cwr,
		1 ns,
		3 rsv,
		4 dataOffset,
	}
}

The 'atom' attribute ensures that an update to a field does not overwrite
other fields in the container.
A possible implementation could use CAS to do this.

-----------------------------------------------------------------------------
4) Implicit padding example

On a 64 bit machine the compiler would add 32 bit padding between the two
fields
shown in the following structure.

struct A {
	uint32_t x;
	uint64_t y;
}

This structure would produce the following Layout:

Layout7:
A, 128 {
	C:32 x,
	C:32,
	C:64 y
}

The specification of this layout does not allow the runtime implementation
to interfere
with the 32 bit padding (bits 32 - 63), as the runtime only has access to
named containers/fields.
In order to write to the padded area one would have to do the following.

Layout8:
A, 128 {
	C:64 {
		32 x,
	    32,
	}
	C:64 y
}

----------------------------------------------------------------------
5) Packed struct example

The following example displays a packed struct.

//using gcc compiler attributes
struct __attribute__ ((__packed__)) PackedStruct{
	uint8_t a;
	uint32_t b; //unaligned 32 bit value
	uint8_t c;
};

This struct produces the following layout:

Layout9:
PackedStruct, 48 {
	C:8 a,
	C:32 b,
	C:8 c,
}

The runtime could choose to implement access to b using (not an exclusive
list):
- unaligned 32bit load/store
- multiple 8bit loads/stores (tearing)
- CaS of 64bits spanning a, b, and c

If b were also atomic, then CaS or a lock may be needed to access it.

------------------------------------------------------------------------
6) Union example

The following example shows how unions can be described in a Layout.

struct A {
	uint16_t x;
	uint16_t y;
};

union C {
	struct A a;
	uint32_t b;
}

The union above can be described in a Layout as:

Layout10:
C, 32 {
	U:32 {
		C:32 {
			16 x,
			16 y,
		},
		32 b,
	}
}

There is a possibility that field names may conflict with one another. For
example if the structures were
renamed in the following manner:

struct A {
	uint16_t a;
	uint16_t b;
};

union C {
	struct A a;
	uint32_t b;
}

It is valid to do this in C but it can't be described using the layout
scheme above. To solve this we need
to use nested fields. A nested layout can be specified by simply replacing
the container size attribute with the name of the layout. This requires
that we have two layouts, one for the union and one for the nested
structure.

Layout11:
A, 32 {
	C:16 x,
	C:16 y,
}
C, 32 {
	U:32 {
		C:A a,  //<--- nested Layout
		C:32 b,
	}
}

-------------------------------------------------------
7) Array Example

The following example shows how a structure of arrays can be described in a
layout.

struct SOA {
	uint8_t a[10];
	uint16_t b[10][10]; //2-d array
};

Layout16:
SOA, 210 {
	8[10] a,
	16[10][10] b,
}

--------------------------------------------------
8) Named container named field example

In this example there is a named container which encloses named fields.

A, 64 {
	64 abcd {
		16 a,
		16 b,
		16 c,
		16 d,
	}
}

An accessor is generated for each field as well as the container. The
container accessor returns all the fields as a single 64bit value.