From atobia at ca.ibm.com Tue Dec 2 18:36:08 2014 From: atobia at ca.ibm.com (Tobi Ajila) Date: Tue, 2 Dec 2014 13:36:08 -0500 Subject: More detail than I had intended on Layout description language Message-ID: >> > Was I correct in seeing that it is proposed to attach LDL to >> > interfaces with annotations? >> >> Yes, we proposed this. However, we're certainly open to alternatives. >> >> Since LDL is not platform-independent, if the LDLs are attached to the generated Java interfaces, then the Java interfaces are also not platform-independent. Is this is good or bad? >Normally I?d say bad. At this point we feel that the LD should not be attached to the generated Java Interfaces as it would make things more difficult for developers who support multiple platforms. Keeping the interface and the LD separated would allow developers to have one copy of the generated interface checked into their source control. Users will need to manage the LD files for all supported platforms. >It might be interesting to try to see what happens with network protocols, >which are ultimately (at the byte-by-byte level) unambiguously specified. >That gets in the way of the platform-independent Java-behavior story, >because I can easily imagine that what you like to have happen in the generated >code is something like a loadInt, and on wrong-byte-order platforms, a byte-swap. >I was thinking ?network protocols? because those are something that we really >do expect to see on more than one platform. We view the LD as two streams of data. A stream that contains layout data such as sizes and offsets, and a stream that carries type data such as name, sign/unsigned, integral/float. Endianness would be part of the type stream and it would be up to the runtime generated class to do the endian swapping. I would also add, the endian attribute would be an absolute value (big or little). We could have a shorthand to specify endianness for an entire Layout, but field specification should always take precedence. >would at least give us a prayer of saying that the same generated proxies would work across platforms We've been viewing the generated proxies as something that is generated at runtime when an LD is read in. This would be platform dependent. >If you are references-only, then it?s all aliased. Suppose you wanted to rotate the >vertices of the triangle, say > Triangle tri = someNativeThing.get Triangle() > Array points = tri(); > Point t = ??I need to make a copy here?? (points.get(0)); > points.get(0).set(points.get(1)); > points.get(1).set(points.get(2)); > points.get(2).set(t); >So what happens to make that copy? I think there are (at least) 3 choices >that are not even mutually exclusive: I think we prefer 1)a). "Native heap" could mean many things such as RDMA, GPU mem, etc. This would introduce many challenges. Unless we make on-heap copies in this case, we would need a way to allow the user to provide a custom allocator (and memcpy) to manage their "native heap". >The other point I wanted to toss in, is that once you put alignment into the mix >and not just as an ?oh yeah, we need to think about alignment? it might change >how we talk about some of the other stuff. If you look at C bitfields, there is very >much a model of ?load a box this big, then shift the data towards the low order >bits some distance, then mask/sign-smear the part we don?t care about?. We think that specifying alignment is important in the context of start allocation for a layout type. Start alignment should be externalized for the benefit of implementing allocators for standalone instances of the layout data. But in terms of field alignment within a layout (this is includes bit fields and nested layouts) we feel that alignment must be specifically declared in the form of padding and not the start alignment attribute. It is the job of the groveller to determine what the required field alignment/padding is on a given platform and the resulting LDL should be very explicit about that. For example: struct c { unsigned short x:1; unsigned short y:7; }; on Big endian LD: "c",":8","y:7","x:1" on Little Endian LD: "c","x:1","y:7",":8" From david.r.chase at oracle.com Tue Dec 2 22:32:04 2014 From: david.r.chase at oracle.com (David Chase) Date: Tue, 2 Dec 2014 17:32:04 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: References: Message-ID: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> Trust, but verify: On 2014-12-02, at 1:36 PM, Tobi Ajila wrote: > For example: > struct c { > unsigned short x:1; > unsigned short y:7; > }; > > on Big endian > > LD: "c",":8","y:7","x:1" > > on Little Endian > > LD: "c","x:1","y:7",":8" On Big Endian, they number the bits from the high order end. struct c { unsigned short x:1; unsigned short y:7; }; struct d { unsigned short x:7; unsigned short y:1; }; union { struct c A; struct d B; unsigned short C;} shortu; ... shortu.A.x=1; shortu.A.y=1; printf("shortu.CA11 = 0x%02x\n", shortu.C); shortu.B.x=1; shortu.B.y=1; printf("shortu.CB11 = 0x%02x\n", shortu.C); shortu.CA11 = 0x8100 shortu.CB11 = 0x300 (Little endian, it prints shortu.CA11 = 0x03 shortu.CB11 = 0x81) Your layout descriptor makes sense if (1) I know that it is within a short and (2) if you are numbering the bits from the LSB of that short. But I don't see how we know that. I have a half-baked counterproposal. The missing baking includes the binding to an interfaces with types for interpreting what is in the bits. ---------------------------------------- Strawman proposal for layout little language: 0. Goals: where layouts are actually invariant across platforms (e.g., network protocols) we want to have just one layout specification, if this is possible. 1. Endianness specification is optionally allowed (could be required). This is a consequence of 0, since NBO is one particular endianness specification. 2. We will add Unsafe.{be,le,native},{Load,Store,Cas}{Short,Int,Long,Float,Double} to enable efficient implementation of 1. These should compile to intrinsics. The reason to do it this way is to ensure consistent translation of little languages into bytecodes across platforms, whenever possible, and also to minimize the offense given to the keepers of the Java/JDK faith by confining the ugly to the official pile of ugly. No need for endian variants of byte load/store. 3. Unlike C bitfield numbering (which varies based on endianness of target platform) we'll always number bitfields in little-endian order; that is, "byte a:1, b:7" (at address x) would be extracted with the expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively. This is LE-centric, but has the nice property that bit 0 of a byte, short, int, or long is extracted with the same operation (x & 1), and so on. 4. I don't know the best way to express offsets. Uniformity suggests that we express all offsets in terms of bits, and we would then do container-math to extract the byte offset of the container (8, 16, 32, or 64-bit) and the shift/mask of the field to extract. That is, an endianness, offset, field size, and alignment/container size (ALERT: what about platforms that allow unaligned containers?) are required to identify the bits for a field. Endianness tells us how to load the container, alignment/container size tells us both how large a thing to load and how to convert the bit offset into a byte offset + shift, and the size tells us what mask to use. Perhaps, per field: offset (in bits) container size (in bits) [field size (in bits) = container size] [container alignment (in bits) = container size] [endianness = structure default] Structure information would look like endianness (<,> -- see Python link for other options) container size (in bits) [container alignment (in bits) = container size] struct c { unsigned short x:1; unsigned short y:7; }; might be (big endian) c = Layouts.make( ">,16", // container size = 16, implicit align = 16 "x:15,16,1", // x = (LoadShortBE>>15) & ((1 << 1) - 1), implicit container align=16, endianness=> "y:8,16,7" // x = (LoadShortBE>>8) & ((1 << 7) - 1), implicit container align=16, endianness=> ) or (little endian) c = Layouts.make( "<,16", // container size = 16, implicit align = 16 "x:0,16,1", // x = (LoadShortLE>>0) & ((1 << 1) - 1), implicit container align=16, endianness=< "y:1,16,7" // x = (LoadShortLE>>1) & ((1 << 7) - 1), implicit container align=16, endianness=< ) Note that I have helpfully provided an utterly formulaic interpretation of the contents of the field specifications; there's no need to accumulate bit offsets across fields, just follow the local recipe and you are done. Optimization is possible -- if I do a big-endian load of a short and then right shift by k+8 bits, I know that I can also do a load of the byte at the same address and right shift by k bits, and this is true on big or little endian machines (a BE short load on a LE box loads a little-endian short, then byteswaps the short). An NBO (Big Endian, I hope I got it right) IP header: struct ip { u_int ip_v:4, /* version */ ip_hl:4; /* header length */ u_char ip_tos; /* type of service */ u_short ip_len; /* total length */ u_short ip_id; /* identification */ u_short ip_off; /* fragment offset field */ u_char ip_ttl; /* time to live */ u_char ip_p; /* protocol */ u_short ip_sum; /* checksum */ struct in_addr ip_src,ip_dst; /* source and dest address */ }; iph = Layouts.make( ">,160,32", // big endian, container size = 160, align = 32 "ip_v:4,8,4", "ip_hl:0,8,4", // Note that I didn't even need to put them in ascending order "ip_tos:8,8", "ip_len:16,16", "ip_id:32,16", "ip_off:48,16", "ip_ttl:64,8", "ip_p:72,8", "ip_sum:80,16", "ip_src:96,32", "ip_dst:128,32" ); The advantage here is that this is actually unambigous (up to super-alignment optimizations; perhaps we have 32-byte cache lines and want the entire thing aligned on a 256-bit boundary) and can serve on any box -- the loadBigEndianShort and loadBigEndianInt calls would of course require byteswapping on a little-endian machine, but there's no escaping that. One thing lacking here is nested structures, and I was going to propose something, but maybe that is not what we are doing here -- perhaps interpretation of the bits more properly belongs elsewhere (since we got this far without saying whether we're specifying integers, boolean vectors, or floating point). There's not-quite-right-for-us prior art in Python Land: https://docs.python.org/3/library/struct.html 5. Note that this trivially allows treatment of unions -- a union of two fields is just two fields that happen to be stored at the same offset. union u { int32_t i; float f; } x; translates to (little endian) u = Layout.make( "<,32", "i,0,32", "f,0,32" ); By-the-way, unions of integers and floats can sometimes be wonky: http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness "There are old ARM processors that have half little-endian, half big-endian floating point representation for double-precision numbers: both 32-bit words are stored in little-endian like integer registers, but the most significant one first." In a sane world, the LSB of each lines up so that (int) 1 bit-puns to the smallest float larger than zero (the tiniest positive denorm). I don't think that is a problem for us to solve, though I had visions of declaring that repeated <><<> specifications describe endianness from the outside in by repeating halving, thus describing weird-arm FP field as "><,0,64" would say that the most significant 32 bits come first, but that the 32 bit halves are each stored least significant byte first. Would an explicit endianness specification have any use when doing shared memory in a world of mixed-endianness multiprocessing? Or have I stepped over the line from "interesting" to "insane"? David From atobia at ca.ibm.com Wed Dec 10 16:56:35 2014 From: atobia at ca.ibm.com (Tobi Ajila) Date: Wed, 10 Dec 2014 11:56:35 -0500 Subject: Reference SUB/SEP Value question In-Reply-To: References: <81818BB2-8C2D-4910-A83F-D03BB7B0AB49@oracle.com> <4BF7F006-2846-4AC1-BA5E-DAB3DABECBF2@oracle.com>

Message-ID: >I think there are two choices, refSUBvalue and refSEPvalue. > >For SUB, you might have a hierarchy that looks like this: > > interface ComplexValue > class Complex implements ComplexValue > interface ComplexRef extends ComplexValue, Ref > class ComplexGeneratedProxy implements ComplexRef > >You can do this because a Reference can support all the >methods of Value by fetching the relevant fields etc, plus >it has some more methods for setting those fields. It seems like the reason to separate ComplexRef and ComplexValue is to be able to treat the Value as an immutable value holder. Correct? If Ref subclasses Value, it will be easy for "immutable" things to change, either by casting to the Ref or by having someone else modify the Ref while you read the Value, and we start to tread on C++ const casts. e.g. ComplexRef refType = Ref.instantiate(ComplexRef.class); ComplexValue valueType = (ComplexValue) refType; //now we have a mutable and immutable version of the same data >For SEP, you might have a hierarchy that looks like this: > > interface ComplexValue > class Complex implements ComplexValue > > interface ComplexRef extends Ref { > ComplexValue get(); // canonical connection between Ref and Value > } > class ComplexGeneratedProxy implements ComplexRef With Ref and Value being unrelated, we run into issues because we can't cast between them. This feels a lot like C++'s reference vs pointer and will lead to similar user frustrations when they can't convert between the Value and Ref. This seems contrary to our objections about the Value and Ref being related, but I think both situations are going to cause similar issues. An alternative proposal is something like the following: interface ComplexRef extends Ref class Complex implements ComplexRef { ComplexRef freeze() { ... } // this returns an immutable Ref type } Ref.instantiate(Ref refType, LD description); In this hierarchy there are only Ref types. Immutability is a state and not a type, this state can be achieved by calling the 'freeze' method. > ComplexRef (in particular) is very likely to be a single-implementation > interface (the proxy class) and thus will be easy to optimize. Our users also want to be able to add their own behaviour to the generated classes. For example, they want to modify the generated 'ComplexRef' interface and add new default methods like 'getAbsValue()'. Users modifying generated artifacts will run into maintenance headaches every time they need to re-run the groveller, which could be quite frequent during development. We would like them to be able to create their own interfaces that extend the ones generated by the groveller. Something like, 'MyComplexRef extends ComplexRef', where 'MyComplexRef' would implement the user functionality. This will affect the API used to instantiate a ComplexRef as it would be necessary to pass in the required interface. e.g. T instantiate(Class clazz, LD description) and the user would call this like: MyComplexRef ref = Ref.instantiate(MyComplexRef.class, LD); This design undermines the assumption that ComplexRef will typically have one implementor. Do you agree with how we bind user-defined behaviour to layouts? What is the best way to do this? From atobia at ca.ibm.com Wed Dec 17 15:26:40 2014 From: atobia at ca.ibm.com (Tobi Ajila) Date: Wed, 17 Dec 2014 10:26:40 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> Message-ID: > 2. We will add > Unsafe.{be,le,native},{Load,Store,Cas}{Short,Int,Long,Float,Double} > to enable efficient implementation of 1. These should compile to intrinsics. > The reason to do it this way is to ensure consistent translation of little > languages into bytecodes across platforms, whenever possible, and also to > minimize the offense given to the keepers of the Java/JDK faith by confining > the ugly to the official pile of ugly. No need for endian variants of byte > load/store. This seems reasonable and portable bytecodes is probably the right design goal. For the new Unsafe methods, let's ensure that the behaviour is well specified. Are the Float/Double versions necessary? So far the LD has been about specifying bits, not the type information. How likely is it that Java's float/double map correctly onto the native side's representation of float/double? Wouldn't using "Float.floatToRawIntBits(float) / intBitsToFloat(int)" and the Double equivalents make more sense? > 4. I don't know the best way to express offsets. Uniformity suggests that > we express all offsets in terms of bits, and we would then do container-math > to extract the byte offset of the container (8, 16, 32, or 64-bit) and the > shift/mask of the field to extract. > That is, an endianness, offset, field size, and alignment/container size > (ALERT: what about platforms that allow unaligned containers?) are required > to identify the bits for a field. Endianness tells us how to load the > container, alignment/container size tells us both how large a thing to > load and how to convert the bit offset into a byte offset + shift, and the > size tells us what mask to use. Our first thought on containers was that it unnecessarily exposes implementation details. We now see it as attempt to externalize the underlying memory model. > If you look at C bitfields, there is very > much a model of ?load a box this big, then shift the data towards the low order > bits some distance, then mask/sign-smear the part we don?t care about?. The container model increases the amount of memory reads when accessing bit-fields. Reading a single bit field has the side effect of reading all other fields in the container. This would affect the behaviour of systems with memory mapped I/O that take certain actions when an address is read. e.g. incrementing a counter every time an address is read. We should also consider the performance impact of masking + shifting every time we interact with a bit field. In the example below, do we need to mask + shift + OR + CaS every time we write to a field in that structure? struct A { int16_t val1 : 8; int16_t val2 : 8; } ">,16", "val1, 0, 16, 8", "val2, 8, 16, 8" Will we have restrictions on container sizes? If someone specifies a 128 bit container in the LDL, how will it be dealt with on platforms that don't have those container sizes? Do we want to go so far as to specify the number, ordering, and atomicity of memory operations used to read a layout field? > Perhaps, per field: > offset (in bits) > container size (in bits) > [field size (in bits) = container size] > [container alignment (in bits) = container size] > [endianness = structure default] > > Structure information would look like > endianness (<,> -- see Python link for other options) > container size (in bits) > [container alignment (in bits) = container size] Then there are some ambiguous cases. e.g. ">, 16", "x, 0, 8, 8", "y, 4, 8, 8", //where does this one start? it has offset of 4 but alignment of 8 "z, 8, 8, 8" What is the expected behaviour of this? or is it even legal? To be more general, how do we handle fields that don't fit into their containers? To summarize, we think: 1) The LD implementation should be able to derive container sizes from field sizes and offsets 2) Container attributes create more opportunity for inconsistent layout descriptors. If we need to specify a memory model, then we propose the following strawman rules: 1) Where a field is completely contained inside a container, and where the container size is no larger than a platform-dependent limit, reads and writes of the field will be atomic. 2) Reads and writes of a field may cause reads and writes of adjacent fields in the same container. 3) Modifying the value of a field preserves the value of adjacent fields in the same container. This rule does not apply to overlapping fields. Doug Lea's input may be valuable for refining this. Is this level of specification needed? From david.r.chase at oracle.com Wed Dec 17 23:15:54 2014 From: david.r.chase at oracle.com (David Chase) Date: Wed, 17 Dec 2014 18:15:54 -0500 Subject: More detail than I had intended on Layout description language In-Reply-To: References: <11A07C50-05D6-4B52-8D28-D166605024A0@oracle.com> Message-ID: <438B90B1-2D1D-4D6E-9C65-47463C01AB04@oracle.com> Sorry to be slow replying to previous email, but I've gotten snowed at this end. However, I think there are some quick answers to some questions: On 2014-12-17, at 10:26 AM, Tobi Ajila wrote: > > 2. We will add > > Unsafe.{be,le,native},{Load,Store,Cas}{Short,Int,Long,Float,Double} > > to enable efficient implementation of 1. These should compile to intrinsics. > > The reason to do it this way is to ensure consistent translation of little > > languages into bytecodes across platforms, whenever possible, and also to > > minimize the offense given to the keepers of the Java/JDK faith by confining > > the ugly to the official pile of ugly. No need for endian variants of byte > > load/store. > This seems reasonable and portable bytecodes is probably the right design goal. For the new Unsafe methods, let's ensure that the behaviour is well specified. > > Are the Float/Double versions necessary? So far the LD has been about specifying bits, not the type information. How likely is it that Java's float/double map correctly onto the native side's representation of float/double? Wouldn't using "Float.floatToRawIntBits(float) / intBitsToFloat(int)" and the Double equivalents make more sense? On float./double I am not sure -- on the one hand you are right that between possibly baroque bitswaps on the incoming integers and longs and existing floatTo and toFloat methods, we could do this, on the other hand there's going to be some hope of making things very efficient and that might be added by including a primitive (else we put ourselves into the pattern-matching intrinsic substitution business). Probably I need to attempt to concoct a use case, and before I go very far into it, I can say that what I imagine involves shared memory and a pair of processors sharing that memory, with mixed endianness between them. Do we care? Do GPUs enter into the float issue at all, or do they use processor formats? > > 4. I don't know the best way to express offsets. Uniformity suggests that > > we express all offsets in terms of bits, and we would then do container-math > > to extract the byte offset of the container (8, 16, 32, or 64-bit) and the > > shift/mask of the field to extract. > > That is, an endianness, offset, field size, and alignment/container size > > (ALERT: what about platforms that allow unaligned containers?) are required > > to identify the bits for a field. Endianness tells us how to load the > > container, alignment/container size tells us both how large a thing to > > load and how to convert the bit offset into a byte offset + shift, and the > > size tells us what mask to use. > > Our first thought on containers was that it unnecessarily exposes implementation details. We now see it as attempt to externalize the underlying memory model. I think you need the containers to allow you to make sense of endianness on any platform of different endianness. It's part of the "how to support single descriptions of network protocols" goal. There may be a better way -- this is the one that worked for me (and I got to it by reverse-engineering the C layout rules, thus it is no surprise it is a good match for C layouts). Maybe this is not entirely necessary -- I'm trying to think about the problem of specifying and decoding a big-endian network protocol on a little-endian box, given that I am told that I was presented with a big-endian spec. Anything that lands in a memory byte is fine -- there's an offset, a load, and a shift+mask to apply. Suppose larger -- suppose the field is 8 bits at offset 13. That puts it in big-endian-bytes 1 and 2, meaning that it lands within a 32-bit quantity. (included bits are 13-20, endpoints differ when divided by 8,16, but not 32 Therefore I can convert this to an appropriate offset within a 32-bit word. So I think a container-free model is also possible, if it is expressed using offset and size, and if endianness is made explicit (so we can do network protocols across platforms with a single definition) -- however also note that we probably need to know the expected alignment of the structure containing the fields. There are surely optimizations that could take advantage of that. An alternate approach is John Rose's proposal to do it with concatenated octet bitfields -- this is nicely unambiguous too, but has the difficulty that it allows the expression of many things that we'll need to implement (that we could get wrong) yet there is no compelling use case for. > > If you look at C bitfields, there is very > > much a model of ?load a box this big, then shift the data towards the low order > > bits some distance, then mask/sign-smear the part we don?t care about?. > > The container model increases the amount of memory reads when accessing bit-fields. Not quite -- you are taking the container model too literally (see above for its actual purpose). (My earlier email did describe it literally -- part of this requires you to temporarily dial your mental model of a compiler back to an age when register allocation and similar things were done so poorly that register windows made sense). I can tell you that C compilers will cheerfully substitute the smallest load that will grab the referenced field, and they've done this for decades. > Reading a single bit field has the side effect of reading all other fields in the container. This would affect the behaviour of systems with memory mapped I/O that take certain actions when an address is read. e.g. incrementing a counter every time an address is read. It's my understanding (I'd need to check the C spec, I am working from memory) that there are no guarantees made in C about how large a read is done to access the bitfields of a structure, and certainly none smaller than the size specifier on the field (i.e., uint32_t x:1 might be accessed with a load using "1", 8, 16, or 32 bits, and perhaps even 64 on some processors). That said, C programmers can be surprisingly good at internalizing (worshipping?) the quirks of the compiler on their own particular platform, and this will tend to be a problem. It's not exactly a good spec for us to say "mimic the most popular C compiler on the particular platform" so I don't much like this (that said, if I had to bet my own money, I'd place a decent-sized bet on the C compiler always picking the smallest load size that gets all the bits). > We should also consider the performance impact of masking + shifting every time we interact with a bit field. > In the example below, do we need to mask + shift + OR + CaS every time we write to a field in that structure? > > struct A { > int16_t val1 : 8; > int16_t val2 : 8; > } > ">,16", > "val1, 0, 16, 8", > "val2, 8, 16, 8" No, because (I am pretty sure that) C compilers do not do that either if they can get away with byte-at-a-time loads and stores. Just checked, clang really goes to town on the optimized stores. *but I have not checked what happens if the bitfields are volatile* -- and I just did, and it changes the behavior, how about that? Well this is annoying. How far do we want to go down the rathole of mimicking C behavior *and* performance? I'll try to put together some carefully written tests that will allows us to know more about the behavior of C compilers, assuming we care about "volatile" fields. > Will we have restrictions on container sizes? Yes. > If someone specifies a 128 bit container in the LDL, how will it be dealt with on platforms that don't have those container sizes? If we ignore memory model issues, we have no problem with that -- it merely tells us how to deal with the address arithmetic (which is all that I was initially thinking about here) and we can make it work. There is the minor issue of specifying containers larger than actual containers, in that we can no longer do the shortcut of "load container, shift, mask, done" -- we'll need to load, load, shift, shift, mask, mask, or. But if it is just a way to express address ranges, only a SMOP. > Do we want to go so far as to specify the number, ordering, and atomicity of memory operations used to read a layout field? > > > Perhaps, per field: > > offset (in bits) > > container size (in bits) > > [field size (in bits) = container size] > > [container alignment (in bits) = container size] > > [endianness = structure default] > > > > Structure information would look like > > endianness (<,> -- see Python link for other options) > > container size (in bits) > > [container alignment (in bits) = container size] > > Then there are some ambiguous cases. e.g. > ">, 16", > "x, 0, 8, 8", > "y, 4, 8, 8", //where does this one start? it has offset of 4 but alignment of 8 > "z, 8, 8, 8" > > What is the expected behaviour of this? or is it even legal? To be more general, how do we handle fields that don't fit into their containers? If we are mimicking the behavior of C compilers, there will be no fields that do not fit in their containers (to the best of my knowledge). We could, however, elect to support it, but asking for volatile behavior on this fields has to be an error because in the face of native code activity we cannot possibly make that guarantee. In the case of offset of 4, alignment of 8, that's just rejected. However, John has suggested that we might want to support something that was offset 4, container 8 (?), size 8, alignment 4, e.g. "y, 4, 8, 8, 4", //where does this one start? it has offset of 4 but alignment of 8 I agree that this is screwy. Under these rules, what the heck does the container mean? Do we revisit John's method instead, perhaps with the restriction that all the parts of a field must be contiguous under the specified endianness, plus an optional "volatile" indicator? One potential gotcha is that the volatile specification might interact with the container size in C code -- I do not know this yet, but obviously it needs to be checked. But again -- if the containers are a problem, I think a model of explicit endianness; endian-dependent bitoffset + size spec for fields; potentially explicit alignment for structures (note that there is an implicit desired alignment obtained from the needs of the field loads, but there might be cases where we wish to override this -- it would disable the larger-than-a-byte load optimizations for fields that straddle byte boundaries); ability to tag fields as "volatile" or "atomic" (Doug, if you have some ideas how this could be better or worse....) and various errors signaled for impossible-to-implement, like volatile fields that straddle a load-atomic boundary. Sorry not to have a reply to the sub/sep email yet -- that's harder to answer I think, plus I have been busy. David > To summarize, we think: > 1) The LD implementation should be able to derive container sizes from field sizes and offsets > 2) Container attributes create more opportunity for inconsistent layout descriptors. > > If we need to specify a memory model, then we propose the following strawman rules: > 1) Where a field is completely contained inside a container, and where the container size is no larger than a platform-dependent limit, reads and writes of the field will be atomic. > 2) Reads and writes of a field may cause reads and writes of adjacent fields in the same container. > 3) Modifying the value of a field preserves the value of adjacent fields in the same container. This rule does not apply to overlapping fields. > > > Doug Lea's input may be valuable for refining this. > > Is this level of specification needed? >