From atobia at ca.ibm.com Thu Feb 12 16:05:51 2015 From: atobia at ca.ibm.com (Tobi Ajila) Date: Thu, 12 Feb 2015 11:05:51 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> <438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com> Message-ID: Hi David Apologies for the slow response. >On float./double I am not sure -- on the one hand you are right that between >possibly baroque bitswaps on the incoming integers and longs and existing >floatTo and toFloat methods, we could do this, on the other hand there's going >to be some hope of making things very efficient and that might be added by >including a primitive (else we put ourselves into the pattern-matching intrinsic >substitution business). Probably I need to attempt to concoct a use case, and >before I go very far into it, I can say that what I imagine involves shared memory >and a pair of processors sharing that memory, with mixed endianness between >them. Do we care? The LDL specification should be able to support mixed endianness as it can indicate the endianness of particular field. Having said that, we don't object to float/double primitives on Unsafe. >Do GPUs enter into the float issue at all, or do they use processor formats? >From my research popular GPUs such as Nvidia adhere to the IEEE754 standard https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf >No, because (I am pretty sure that) C compilers do not do that either if they can get away >with byte-at-a-time loads and stores. Just checked, clang really goes to town on the optimized >stores. *but I have not checked what happens if the bitfields are volatile* -- and I just did, >and it changes the behavior, how about that? >Well this is annoying. How far do we want to go down the rathole of mimicking C behavior >*and* performance? We want a memory model that is strong enough to satisfy the intuition of C programmers, but not stronger. I think we all agree that restricting the sizes of reads unnecessarily restricts the implementation. Atomicity is useful, but should not be a default property. We've incorporated it into the following proposal. The following is a new proposal. It contains elements from your previous proposal (some of it is copy-pasted). Strawman proposal for layout little language: 0. Goals: where layouts are actually invariant across platforms (e.g., network protocols) we want to have just one layout specification. 1. The LD specification must be well defined. This means that JDKs can not infer alignment or padding differently. 2. The LD must specify the endianness of the layout. The bit and byte endian must be consistent. Endian is specified at container granularity. A shorthand notation can be provided to specify endian for all containers in a layout. 3. A field is a contiguous sequence of bits confined to a container. An update to a field may overwrite contents of other fields within the same container. If the enclosing container is marked as "atomic", another thread cannot observe the value of fields in the container before the update is complete. A field can not be greater than the size of the container. The sizes of fields within a container must add up to the size of the container. A field does not require a name. Accessors are only generated for named fields. 4. Unlike C bitfield numbering (which varies based on endianness of target platform) we'll always number fields in little-endian order; that is, "byte a:1, b:7" (at address x) would be extracted with the expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively. This is LE-centric, but has the nice property that bit 0 of a byte, short, int, or long is extracted with the same operation (x & 1), and so on. 5. A container is a sequence of one or more adjacent fields. Changing a field in a container will not change fields of other containers. Container sizes must be a multiple of 8bits. There is no upper limit to the size of a container. A container does not require a name. A container can not be larger than the enclosing layout. The sizes of all the containers in a layout must add up to the size of the enclosing layout. Accessors are only generated for named containers. 6. A layout is a is sequence of one or more adjacent containers or unions. The default alignment of a layout is the size of the layout. Note: There are some flaws with the default alignment rule, should a layout with an array of 35bytes have an alignment of 35btyes? Perhaps we should follow the rule used by common C compilers. Something like, "default alignment is the size of the largest container in the layout rounded up to 2^n bits. In the case of arrays the container element size is considered". 7. Grammars for layouts: layoutName','size','[endianness','][alignment] //endianness (<, >) LE, BE ['{' {(containers | unions) ','} '}'] for unions: 'U:'unionSize [unionName] ['{' {containers','} '}'] for containers: ['C:'][endianess:] (containerSize [containerName] | layoutName) {'[' numOFElements ']'} ['{' {fields','} '}'] for fields: ['F:']fieldSize [fieldName] This notation describes fields/containers by their position relative to each other and their sizes. This is hopefully less error prone than the size + offset technique. ----------------------------------------------------- 1) Basic example The following is a basic structure with two fields 'x' and 'y'. struct A { uint16_t x; uint16_t y; }; This structure produces the following layout. Layout1: A, <, 32 { C:16 x, C:16 y, } The following is also acceptable as "C:" is optional A, <, 32 { 16 x, 16 y, } Endian independent accessors would be: x = LoadShortLE(base + 0) y = LoadShortLE(base + 2) ------------------------------------------------------------------------------ 2) IP Header Example The next example shows the layout of an IPV4Header. Layout3: IPv4, >, 160 { C:8 { 4 ihl, 4 version, } C:8 { 2 ECN, 6 DSCP, } C:16 totLen, C:16 iden, C:16 { 13 fragOff, 3 flags, } C:8 TTL, C:8 Proto, C:16 Checksum, C:32 srcAddr, C:32 destAddr, C:32 options, } -------------------------------------------------------- 3) TCP Example The follwing is a picture of TCP packet (bytes 12 - 14) | dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1 | PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 | The corresponding layout is the following: Layout4: tcp, >, 16 { C:16 { 1 fin, 1 syn, 1 rst, 1 psh, 1 ack, 1 urg, 1 ece, 1 cwr, 1 ns, 3 rsv, 4 dataOffset, } } There are other ways to write 'Layout4', one could do it this way. Layout5: tcp, >, 16 { c:8 { 1 ns , 3 rsv, 4 dataOffset, } C:8 { 1 fin, 1 syn, 1 rst, 1 psh, 1 ack, 1 urg, 1 ece, 1 cwr, } } The memory layout in 'Layout5' is the same as 'Layout4' except that the interference rules are different. Writing to fields in the first byte can not overwrite fields in the second byte. In Layout4 and 5 the runtime implementation may overwrite other fields in the container, to avoid this one can do the following Layout6: tcp, >, 16 { C:16:atom { 1 fin, 1 syn, 1 rst, 1 psh, 1 ack, 1 urg, 1 ece, 1 cwr, 1 ns, 3 rsv, 4 dataOffset, } } The 'atom' attribute ensures that an update to a field does not overwrite other fields in the container. A possible implementation could use CAS to do this. ----------------------------------------------------------------------------- 4) Implicit padding example On a 64 bit machine the compiler would add 32 bit padding between the two fields shown in the following structure. struct A { uint32_t x; uint64_t y; } This structure would produce the following Layout: Layout7: A, 128 { C:32 x, C:32, C:64 y } The specification of this layout does not allow the runtime implementation to interfere with the 32 bit padding (bits 32 - 63), as the runtime only has access to named containers/fields. In order to write to the padded area one would have to do the following. Layout8: A, 128 { C:64 { 32 x, 32, } C:64 y } ---------------------------------------------------------------------- 5) Packed struct example The following example displays a packed struct. //using gcc compiler attributes struct __attribute__ ((__packed__)) PackedStruct{ uint8_t a; uint32_t b; //unaligned 32 bit value uint8_t c; }; This struct produces the following layout: Layout9: PackedStruct, 48 { C:8 a, C:32 b, C:8 c, } The runtime could choose to implement access to b using (not an exclusive list): - unaligned 32bit load/store - multiple 8bit loads/stores (tearing) - CaS of 64bits spanning a, b, and c If b were also atomic, then CaS or a lock may be needed to access it. ------------------------------------------------------------------------ 6) Union example The following example shows how unions can be described in a Layout. struct A { uint16_t x; uint16_t y; }; union C { struct A a; uint32_t b; } The union above can be described in a Layout as: Layout10: C, 32 { U:32 { C:32 { 16 x, 16 y, }, 32 b, } } There is a possibility that field names may conflict with one another. For example if the structures were renamed in the following manner: struct A { uint16_t a; uint16_t b; }; union C { struct A a; uint32_t b; } It is valid to do this in C but it can't be described using the layout scheme above. To solve this we need to use nested fields. A nested layout can be specified by simply replacing the container size attribute with the name of the layout. This requires that we have two layouts, one for the union and one for the nested structure. Layout11: A, 32 { C:16 x, C:16 y, } C, 32 { U:32 { C:A a, //<--- nested Layout C:32 b, } } ------------------------------------------------------- 7) Array Example The following example shows how a structure of arrays can be described in a layout. struct SOA { uint8_t a[10]; uint16_t b[10][10]; //2-d array }; Layout16: SOA, 210 { 8[10] a, 16[10][10] b, } -------------------------------------------------- 8) Named container named field example In this example there is a named container which encloses named fields. A, 64 { 64 abcd { 16 a, 16 b, 16 c, 16 d, } } An accessor is generated for each field as well as the container. The container accessor returns all the fields as a single 64bit value. From angela_lin at ca.ibm.com Fri Feb 13 14:56:34 2015 From: angela_lin at ca.ibm.com (Angela Lin) Date: Fri, 13 Feb 2015 09:56:34 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> <438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com> Message-ID: "panama-spec-experts" wrote on 02/12/2015 11:05:51 AM: > From: Tobi Ajila/Ottawa/IBM at IBMCA > To: David Chase > Cc: Doug Lea
, panama-spec-experts at openjdk.java.net > Date: 02/12/2015 11:07 AM > Subject: Re: More detail than I had intended on Layout description language > Sent by: "panama-spec-experts" > > > >Do GPUs enter into the float issue at all, or do they use processor > formats? > From my research popular GPUs such as Nvidia adhere to the IEEE754 standard > https://developer.nvidia.com/sites/default/files/akamai/cuda/files/ > NVIDIA-CUDA-Floating-Point.pdf > The IEEE754 standard allows leeway for implementations to omit certain features. For example, the extended and extendable precision formats may or may not be implemented. I think IEEE754 also allows the results of a sequence of operations to differ depending on how a compiler optimizes them. So the result of a floating point computation in the VM might differ from the result from a GPU native library. This suggests that we should discourage performing fp operations interchangeably across Java (on host CPU) and GPU. OpenCL seems to have a non-IEEE754-compliant mode of operation. From david.r.chase at oracle.com Wed Feb 18 01:21:11 2015 From: david.r.chase at oracle.com (David Chase) Date: Tue, 17 Feb 2015 20:21:11 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> <438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com> Message-ID: <865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com> On 2015-02-12, at 11:05 AM, Tobi Ajila wrote: > Apologies for the slow response. Same on this end, plus I am leaving Oracle, which will make my responses infinitely slow. Nonetheless, while I am still here, I should comment. Overall, I like the strawman, and only see a couple of places where there's likely to be disagreement, and I am not even sure it is "disagree" as "want to be sure we understand why we are doing this". > We want a memory model that is strong enough to satisfy the intuition of C programmers, but not stronger. I think we all agree that restricting the sizes of reads unnecessarily restricts the implementation. > > Atomicity is useful, but should not be a default property. We've incorporated it into the following proposal. > > The following is a new proposal. It contains elements from your previous proposal (some of it is copy-pasted). > > Strawman proposal for layout little language: > > 0. Goals: where layouts are actually invariant across platforms > (e.g., network protocols) we want to have just one layout specification. > > 1. The LD specification must be well defined. This means that JDKs can not infer > alignment or padding differently. > > 2. The LD must specify the endianness of the layout. The bit and byte endian must be consistent. > Endian is specified at container granularity. A shorthand notation can be provided to specify endian for all containers in a layout. > > 3. A field is a contiguous sequence of bits confined to a container. An update to a field may > overwrite contents of other fields within the same container. If the enclosing container is > marked as "atomic", another thread cannot observe the value of fields in the container before > the update is complete. A field can not be greater than the size of the container. The sizes > of fields within a container must add up to the size of the container. A field does not > require a name. Accessors are only generated for named fields. > 4. Unlike C bitfield numbering (which varies based on endianness of target > platform) we'll always number fields in little-endian order; > that is, "byte a:1, b:7" (at address x) would be extracted with the > expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively. > This is LE-centric, but has the nice property that bit 0 of a byte, short, > int, or long is extracted with the same operation (x & 1), and so on. This is the one place I am not sure. I can argue either side of this, but I think that the "ease of our code generation" is one of the lesser of several dubiously compelling reasons. Pro: Pushing the conventions of C compilers this far is not necessary, because we already have a defined interface for C programmers, and it is relatively attractive, so why should we give them another one? For Java programmers this will be someone intuitive and non-surprising. In addition, little-endian is mighty-common-case. Con: It's not what C compilers do, and the people doing this stuff by hand are still probably coping with C compiler conventions. It's also not Network byte order, which might complicate things a little for those protocols when reading them from the spec (or not -- I haven't spent that much time reading network specs to know how they do it). That's all I've got right now, I don't find it super-convincing either way. > 5. A container is a sequence of one or more adjacent fields. Changing a field in a container will not > change fields of other containers. Container sizes must be a multiple of 8bits. There is no upper > limit to the size of a container. A container does not require a name. A container can not be > larger than the enclosing layout. The sizes of all the containers in a layout must add up to the > size of the enclosing layout. Accessors are only generated for named containers. Container sizes are also powers-of-two. I think there is a tension between the maximum container size and memory-model issues. I would like it to be the case that the maximum container sizes corresponds to the largest number of bits that can be written atomically; you would also like it to be the case that the minimum container size is the smallest thing that can be updated without the possibility of reverting concurrent stores from other threads of execution. Note that properly getting the multi-field-single-container load/store atomicity right will be something that implementors of this thing will need to get right, and it is something that C programmers notice. Looking at your grammar, don't you need to be able to tag a container with a non-default alignment? Or is it your intention that the algorithm is "default is natural alignment if possible given supplied padding, otherwise the largest alignment that works with the padding, and ERROR if any field within the container does not obtain its desired alignment? Think about structs containing structs, thinking about the generation of interior pointers/references, think about the need to know the alignment of the pointed struct when a load is performed. I think the C assumption is that a uint64_t* is a pointer (on a 64-bit-ish machine) that is 64-bit aligned. If you want a pointer to a misaligned uint64_t, I think that is a different animal (or do we take the position that we can hide that detail behind an interface?) > 6. A layout is a is sequence of one or more adjacent containers or unions. The default alignment > of a layout is the size of the layout. The alignment of a layout is the maximum alignment of the containers and unions that it comprises. E.g., in C struct foo { unint64_t a, b, c; } the alignment of a struct foo is 64 bits, not 192 bits. So it follows the container, not the layout. > Note: > There are some flaws with the default alignment rule, should a layout with an array of 35bytes > have an alignment of 35btyes? Perhaps we should follow the rule used by common C compilers. > Something like, "default alignment is the size of the largest container in the layout rounded up > to 2^n bits. In the case of arrays the container element size is considered". > > 7. Grammars > for layouts: > layoutName','size','[endianness','][alignment] //endianness (<, >) LE, BE > ['{' > {(containers | unions) ','} > '}'] > > for unions: > 'U:'unionSize [unionName] > ['{' > {containers','} > '}'] > > for containers: > ['C:'][endianess:] (containerSize [containerName] | layoutName) {'[' numOFElements ']'} > ['{' > {fields','} > '}'] > > for fields: > ['F:']fieldSize [fieldName] > > This notation describes fields/containers by their position relative > to each other and their sizes. This is hopefully less error prone than the > size + offset technique. > > ----------------------------------------------------- > 1) Basic example > > The following is a basic structure with two fields 'x' and 'y'. > > struct A { > uint16_t x; > uint16_t y; > }; > > This structure produces the following layout. > > Layout1: > > A, <, 32 { > C:16 x, > C:16 y, > } > > The following is also acceptable as "C:" is optional > > A, <, 32 { > 16 x, > 16 y, > } > > Endian independent accessors would be: > x = LoadShortLE(base + 0) > y = LoadShortLE(base + 2) > ------------------------------------------------------------------------------ > 2) IP Header Example > > The next example shows the layout of an IPV4Header. > > Layout3: > IPv4, >, 160 { > C:8 { > 4 ihl, > 4 version, > } > C:8 { > 2 ECN, > 6 DSCP, > } > C:16 totLen, > C:16 iden, > C:16 { > 13 fragOff, > 3 flags, > } > C:8 TTL, > C:8 Proto, > C:16 Checksum, > C:32 srcAddr, > C:32 destAddr, > C:32 options, > } > -------------------------------------------------------- > 3) TCP Example > > The follwing is a picture of TCP packet (bytes 12 - 14) > | dataOffset - 4 | rsv - 3 | NS - 1 | CWR - 1 | ECE - 1 | URG - 1 | ACK - 1 | PSH - 1 | RST - 1 | SYN - 1 | FIN - 1 | > > The corresponding layout is the following: > > Layout4: > tcp, >, 16 { > C:16 { > 1 fin, > 1 syn, > 1 rst, > 1 psh, > 1 ack, > 1 urg, > 1 ece, > 1 cwr, > 1 ns, > 3 rsv, > 4 dataOffset, > } > } > > > There are other ways to write 'Layout4', one could do it this way. > Layout5: > tcp, >, 16 { > c:8 { > 1 ns , > 3 rsv, > 4 dataOffset, > } > C:8 { > 1 fin, > 1 syn, > 1 rst, > 1 psh, > 1 ack, > 1 urg, > 1 ece, > 1 cwr, > } > } > > The memory layout in 'Layout5' is the same as 'Layout4' except that the interference rules are > different. Writing to fields in the first byte can not overwrite fields in the second byte. > > In Layout4 and 5 the runtime implementation may overwrite other fields in the container, > to avoid this one can do the following > > Layout6: > tcp, >, 16 { > C:16:atom { > 1 fin, > 1 syn, > 1 rst, > 1 psh, > 1 ack, > 1 urg, > 1 ece, > 1 cwr, > 1 ns, > 3 rsv, > 4 dataOffset, > } > } > > The 'atom' attribute ensures that an update to a field does not overwrite other fields in the container. > A possible implementation could use CAS to do this. Is a compiler allowed to coalesce atomic operations within a container into a single CAS? > ----------------------------------------------------------------------------- > 4) Implicit padding example > > On a 64 bit machine the compiler would add 32 bit padding between the two fields > shown in the following structure. > > struct A { > uint32_t x; > uint64_t y; > } > This structure would produce the following Layout: > > Layout7: > A, 128 { > C:32 x, > C:32, > C:64 y > } > > The specification of this layout does not allow the runtime implementation to interfere > with the 32 bit padding (bits 32 - 63), as the runtime only has access to named containers/fields. > In order to write to the padded area one would have to do the following. > > Layout8: > A, 128 { > C:64 { > 32 x, > 32, > } > C:64 y > } > > ---------------------------------------------------------------------- > 5) Packed struct example > > The following example displays a packed struct. > > //using gcc compiler attributes > struct __attribute__ ((__packed__)) PackedStruct{ > uint8_t a; > uint32_t b; //unaligned 32 bit value > uint8_t c; > }; > > This struct produces the following layout: > > Layout9: > PackedStruct, 48 { > C:8 a, > C:32 b, > C:8 c, > } > > The runtime could choose to implement access to b using (not an exclusive list): > - unaligned 32bit load/store > - multiple 8bit loads/stores (tearing) > - CaS of 64bits spanning a, b, and c > > If b were also atomic, then CaS or a lock may be needed to access it. > > ------------------------------------------------------------------------ > 6) Union example > > The following example shows how unions can be described in a Layout. > > struct A { > uint16_t x; > uint16_t y; > }; > > union C { > struct A a; > uint32_t b; > } > > The union above can be described in a Layout as: > > Layout10: > C, 32 { > U:32 { > C:32 { > 16 x, > 16 y, > }, > 32 b, > } > } > > There is a possibility that field names may conflict with one another. For example if the structures were > renamed in the following manner: > > struct A { > uint16_t a; > uint16_t b; > }; > > union C { > struct A a; > uint32_t b; > } > > It is valid to do this in C but it can't be described using the layout scheme above. To solve this we need > to use nested fields. A nested layout can be specified by simply replacing > the container size attribute with the name of the layout. This requires that we have two layouts, one for the union and one for the nested structure. > > Layout11: > A, 32 { > C:16 x, > C:16 y, > } > C, 32 { > U:32 { > C:A a, //<--- nested Layout > C:32 b, > } > } > > ------------------------------------------------------- > 7) Array Example > > The following example shows how a structure of arrays can be described in a layout. > > struct SOA { > uint8_t a[10]; > uint16_t b[10][10]; //2-d array > }; > > Layout16: > SOA, 210 { > 8[10] a, > 16[10][10] b, > } > > -------------------------------------------------- > 8) Named container named field example > > In this example there is a named container which encloses named fields. > > A, 64 { > 64 abcd { > 16 a, > 16 b, > 16 c, > 16 d, > } > } > > An accessor is generated for each field as well as the container. The > container accessor returns all the fields as a single 64bit value. > From atobia at ca.ibm.com Fri Feb 27 21:22:36 2015 From: atobia at ca.ibm.com (Tobi Ajila) Date: Fri, 27 Feb 2015 16:22:36 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: <865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com> References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> <438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com> <865DD970-F994-4277-91AA-25D5E6C81BCE@oracle.com> Message-ID: >> 4. Unlike C bitfield numbering (which varies based on endianness of target >> platform) we'll always number fields in little-endian order; >> that is, "byte a:1, b:7" (at address x) would be extracted with the >> expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively. >> This is LE-centric, but has the nice property that bit 0 of a byte, short, >> int, or long is extracted with the same operation (x & 1), and so on. >This is the one place I am not sure. I can argue either side of this, but I think that >the "ease of our code generation" is one of the lesser of several dubiously compelling >reasons. ... >That's all I've got right now, I don't find it super-convincing either way. For now, let's stick with LE numbering since it is sufficient for common cases. > Container sizes are also powers-of-two. I think there is a tension between the maximum container > size and memory-model issues. I would like it to be the case that the maximum container sizes > corresponds to the largest number of bits that can be written atomically; you would also like it to be > the case that the minimum container size is the smallest thing that can be updated without the possibility > of reverting concurrent stores from other threads of execution. We are not in favour of a max container size because we want container sizes to be forward compatible with future hardware. We can always revert to doing something slow (e.g. using a lock) in order to achieve atomicity in cases where containers are larger than the largest register size. We must be able to describe misaligned containers because we must be able to describe existing data structures that have misaligned fields. If we can describe misaligned containers, we can also describe odd sized containers (as long as they are a multiple of 8-bits). The implementation will be able to deal with both of these cases. > Looking at your grammar, don't you need to be able to tag a container with a non-default alignment? > > Or is it your intention that the algorithm is "default is natural alignment if possible given supplied padding, > otherwise the largest alignment that works with the padding, and ERROR if any field within the container > does not obtain its desired alignment? My intention here is that the LDL spec only describes the size and location of fields. Any alignment is specified with explicit padding. We think the LDL spec should not enforce a container alignment, but only represent it. > I think the C assumption is that a uint64_t* is a pointer (on a 64-bit-ish machine) that is 64-bit aligned. > If you want a pointer to a misaligned uint64_t, I think that is a different animal (or do we take the > position that we can hide that detail behind an interface?) We want to be able to specify both of these: on 64 bit: struct Node { int8_t data; struct Node* next; } LDL: Node, 128 { 8 data, 56 , 64 next, } struct __attribute__ ((__packed__)) Node { int8_t data; struct Node* next; } LDL: Node, 72 { 8 data, 64 next, } With the use of locks, we can support structures with misaligned containers. > Think about structs containing structs, thinking about the generation > of interior pointers/references, think about the need to know the alignment of the pointed struct when a load > is performed. That's a good point struct A { int32_t x; int32_t y; } LDL: A, 32 { //default alignment of 32 32 x, 32 y, } struct __attribute__ ((__packed__)) B { int8_t z; struct A a; } LDL: B, 72 { 8 z, A a, //this is aligned to 8 not 32 } Can I get 'a' and treat it like it's aligned? With the exception of atomics this is not a big problem. Most modern hardware allow you to access misaligned memory with some performance penalties. The nested struct is not the only case where this problem could occur. What if we point a perfectly aligned Layout to an unaligned memory location (like casting unaligned memory to a struct)? We think there are 3 answers to this: 1) Ignore it - Use C behaviour as the standard 2) Don't allow it - Perform runtime checks for misaligned access and throw an exception 3) Handle it safely - Provide two accessors, the regular one, and one that can safely handled misaligned access >> The 'atom' attribute ensures that an update to a field does not overwrite other fields in the container. >> A possible implementation could use CAS to do this. > > Is a compiler allowed to coalesce atomic operations within a container into a single CAS? Yes, I don't see a reason not to allow this.