More detail than I had intended on Layout description language.

Tue Nov 11 01:35:09 UTC 2014

On 11/10/2014 01:57 PM, David Chase wrote:
>
> On 2014-10-31, at 1:21 PM, Angela Lin <angela_lin at ca.ibm.com> wrote:
>> Then there are these divergent rabbit holes to follow:
>> i) the tiny language specification  <-- Let's call this the "layout
>> descriptor language"?
>> ii) the runtime system
>>
>> There might be a 3rd part that glues between the first two:
>> iii) generated interfaces that are provided to the Java programmer
>
> The test case I’ve been playing with to try to figure out the right way to talk about
> this is
>
> struct c {
>       unsigned short x:1;
>       unsigned short y:7;
> };
>
> I think that the little-language generated for that has to depend on the endianness of
> the underlying platform, because it generates different results at the Java side.  On a
> little-endian machine, story 1 into x and y would be something along the lines of
>
> storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003
>
> but on a bigendian machine
>
> storeShort(address, ((1&1)<<15) + ((127&1)<<8)) // store 0x8100
>
> So, since the Java-side behavior should be different, I think that the little language inputs
> (the output of the libclang-based tool) should be different depending on platform — the
> offsets will need to be described in ways that make conform to Java’s expectations.
>
> One thing that might not be obvious is that the little language will need to specify the
> size of the bit container into which bitfields are stored, because the translation between
> pairs of byte offsets and short offsets differs on LE and BE machines.
>
> For example, the little endian storeByte equivalent of
>
> storeShort(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x0003
>
> is
>
> storeByte(address, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
> storeByte(address+1) // store 0x00__
>
> but big endian is
>
> storeByte(address) // store 0x00__
> storeByte(address+1, ((1&1)<<0) + ((127 & 1)<<1)) // store 0x__03
>
> So a simple offset and size formulation is not adequate; we need to know the
> underlying container size for a given bit address — unless we blow everything
> down to store bytes and then hack our compiler to reconsolidate adjacent stores.
>
> Again, input from Henry would be helpful here — this container-size
> information may already be present because of how C deals with bitfields
> in general (a char-typed bitfield may not span a byte boundary, but a short-typed
> bitfield can if it is not also a short boundary).  Another way to think of container
> size is “alignment” — a struct of short-typed bitfields will have size and alignment 2
> even if the bitfields all fit in a single byte.  (I don’t think we want to support full
> generality here because we could end up in places where we need to understand
> the underlying endianness; if we use byte loads to obtain an int, to where do
> we shuffle the respective bytes?)
>

We need to consider when/where the libclang-based tool is used, that 
dictates whether we can rely on libclang for such information. The 
information is available for a target platform given a header file.

It is possible to build some sort of abstraction layer to describe 
layout that can cover general cases, we will see how the experiment goes.

Cheers,
Henry