More detail than I had intended on Layout description language

Tue Dec 2 22:32:04 UTC 2014

Trust, but verify:
On 2014-12-02, at 1:36 PM, Tobi Ajila <atobia at ca.ibm.com> wrote:
> For example:
> struct c {
>     unsigned short x:1;
>     unsigned short y:7;
> };
> 
> on Big endian
> 
> LD: "c",":8","y:7","x:1"
> 
> on Little Endian
> 
> LD: "c","x:1","y:7",":8"

On Big Endian, they number the bits from the high order end.

struct c {
	unsigned short x:1;
	unsigned short y:7;
};
struct d {
	unsigned short x:7;
	unsigned short y:1;
};
union { struct c A; struct d B; unsigned short C;} shortu;
...
	shortu.A.x=1;
	shortu.A.y=1;
	printf("shortu.CA11 = 0x%02x\n", shortu.C);
	shortu.B.x=1;
	shortu.B.y=1;
	printf("shortu.CB11 = 0x%02x\n", shortu.C);

shortu.CA11 = 0x8100
shortu.CB11 = 0x300

(Little endian, it prints
shortu.CA11 = 0x03
shortu.CB11 = 0x81)

Your layout descriptor makes sense if
(1) I know that it is within a short and
(2) if you are numbering the bits from the LSB of that short.
But I don't see how we know that.

I have a half-baked counterproposal.
The missing baking includes the binding to an interfaces with types
for interpreting what is in the bits.
----------------------------------------

Strawman proposal for layout little language:

0. Goals: where layouts are actually invariant across platforms
   (e.g., network protocols) we want to have just one layout specification,
   if this is possible.

1. Endianness specification is optionally allowed (could be required).
   This is a consequence of 0, since NBO is one particular endianness specification.

2. We will add
   Unsafe.{be,le,native},{Load,Store,Cas}{Short,Int,Long,Float,Double} 
   to enable efficient implementation of 1.  These should compile to intrinsics.
   The reason to do it this way is to ensure consistent translation of little
   languages into bytecodes across platforms, whenever possible, and also to
   minimize the offense given to the keepers of the Java/JDK faith by confining
   the ugly to the official pile of ugly.  No need for endian variants of byte
   load/store.

3. Unlike C bitfield numbering (which varies based on endianness of target
   platform) we'll always number bitfields in little-endian order;
   that is, "byte a:1, b:7" (at address x) would be extracted with the
   expressions "loadByte(x) & 1" and "loadByte(x) >> 1", respectively.
   This is LE-centric, but has the nice property that bit 0 of a byte, short,
   int, or long is extracted with the same operation (x & 1), and so on.

4. I don't know the best way to express offsets.  Uniformity suggests that
   we express all offsets in terms of bits, and we would then do container-math
   to extract the byte offset of the container (8, 16, 32, or 64-bit) and the
   shift/mask of the field to extract.
   That is, an endianness, offset, field size, and alignment/container size
   (ALERT: what about platforms that allow unaligned containers?) are required
   to identify the bits for a field.  Endianness tells us how to load the
   container, alignment/container size tells us both how large a thing to
   load and how to convert the bit offset into a byte offset + shift, and the
   size tells us what mask to use.

   Perhaps, per field:
      offset (in bits)
      container size (in bits)
      [field size (in bits) = container size]
      [container alignment (in bits) = container size]
      [endianness = structure default]

    Structure information would look like
      endianness (<,> -- see Python link for other options)
      container size (in bits)
      [container alignment (in bits) = container size]

struct c {
    unsigned short x:1;
    unsigned short y:7;
};

might be (big endian)
  c = Layouts.make(
      ">,16",      // container size = 16, implicit align = 16
      "x:15,16,1", // x = (LoadShortBE>>15) & ((1 << 1) - 1), implicit container align=16, endianness=>
      "y:8,16,7"   // x = (LoadShortBE>>8) & ((1 << 7) - 1), implicit container align=16, endianness=>
  )
or (little endian)
  c = Layouts.make(
      "<,16",      // container size = 16, implicit align = 16
      "x:0,16,1",  // x = (LoadShortLE>>0) & ((1 << 1) - 1), implicit container align=16, endianness=<
      "y:1,16,7"   // x = (LoadShortLE>>1) & ((1 << 7) - 1), implicit container align=16, endianness=<
  )

Note that I have helpfully provided an utterly formulaic interpretation
of the contents of the field specifications; there's no need to accumulate
bit offsets across fields, just follow the local recipe and you are done.
Optimization is possible -- if I do a big-endian load of a short and then
right shift by k+8 bits, I know that I can also do a load of the byte at the
same address and right shift by k bits, and this is true on big or little
endian machines (a BE short load on a LE box loads a little-endian short,
then byteswaps the short).

An NBO (Big Endian, I hope I got it right) IP header:
struct ip {
    u_int   ip_v:4,                 /* version */
            ip_hl:4;                  /* header length */
    u_char  ip_tos;               /* type of service */
    u_short ip_len;               /* total length */
    u_short ip_id;                 /* identification */
    u_short ip_off;                /* fragment offset field */
    u_char  ip_ttl;                 /* time to live */
    u_char  ip_p;                   /* protocol */
    u_short ip_sum;              /* checksum */
    struct  in_addr ip_src,ip_dst;  /* source and dest address */
};

iph = Layouts.make(
  ">,160,32",   // big endian, container size = 160, align = 32
  "ip_v:4,8,4",
  "ip_hl:0,8,4", // Note that I didn't even need to put them in ascending order
  "ip_tos:8,8",
  "ip_len:16,16",
  "ip_id:32,16",
  "ip_off:48,16",
  "ip_ttl:64,8",
  "ip_p:72,8",
  "ip_sum:80,16",
  "ip_src:96,32",
  "ip_dst:128,32"
);

The advantage here is that this is actually unambigous
(up to super-alignment optimizations; perhaps we have 32-byte
cache lines and want the entire thing aligned on a 256-bit boundary)
and can serve on any box -- the loadBigEndianShort and
loadBigEndianInt calls would of course require byteswapping
on a little-endian machine, but there's no escaping that.

One thing lacking here is nested structures, and I was going
to propose something, but maybe that is not what we are doing
here -- perhaps interpretation of the bits more properly
belongs elsewhere (since we got this far without saying whether
we're specifying integers, boolean vectors, or floating point).

There's not-quite-right-for-us prior art in Python Land:
https://docs.python.org/3/library/struct.html

5. Note that this trivially allows treatment of unions -- a union of two fields
   is just two fields that happen to be stored at the same offset.  

 union u {
    int32_t i;
    float f;
 } x;

translates to (little endian)

u = Layout.make(
  "<,32",
  "i,0,32",
  "f,0,32"
);

By-the-way, unions of integers and floats can sometimes be wonky:
http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness
"There are old ARM processors that have half little-endian, half big-endian
floating point representation for double-precision numbers: both 32-bit words
are stored in little-endian like integer registers, but the most significant one first."
In a sane world, the LSB of each lines up so that (int) 1 bit-puns to the smallest
float larger than zero (the tiniest positive denorm).

I don't think that is a problem for us to solve, though I had visions of
declaring that repeated <><<> specifications describe endianness from
the outside in by repeating halving, thus describing weird-arm FP field as
"><,0,64"
would say that the most significant 32 bits come first, but that the 32 bit
halves are each stored least significant byte first.

Would an explicit endianness specification have any use when doing shared
memory in a world of mixed-endianness multiprocessing?  Or have I stepped
over the line from "interesting" to "insane"?

David