bitfields support in jextract

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Jun 29 17:36:08 UTC 2018


Hi,
I'm sending this internally as an headsup...

I think that, after much banging head against walls, I might have found 
a relatively simple approach to handle bitfields in jextract. Let's 
recap some of the issues:

* the clang API exposes 'real' bitfield offsets
* the clang API exposes bitfield width (as per source code)
* the clang API does NOT expose physical mapping of bitfields

The last point is crucial - bitfields access can always be thought of 
accessing a container, and then bitmasking to get the bitfield value. 
The C specification talks about storage unit:

```
An implementation may allocate any addressable storage unit large enough 
to hold a bit-
field.  If enough space remains, a bit-field that immediately follows 
another bit-field in a
structure shall be packed into adjacent bits of the same unit.  If 
insufficient space remains,
whether  a  bit-field  that  does  not  fit  is  put  into  the next  
unit  or  overlaps  adjacent  units  is
implementation-defined.  The order of allocation of bit-fields within a 
unit (high-order to
low-order  or  low-order  to  high-order)  is implementation-defined.  
The  alignment  of  the
addressable storage unit is unspecified
```

In other words, the implementation needs to pick a suitable storage 
unit, and should keep using it until it can - when it can't it should 
just make new one and go on from there.

Note that this is all impl specific, the C spec says nothing about how 
to derive a storage unit from the bitfields declaration.

Now, clang *does* expose some low level physical info in its dumps - 
e.g. for a struct like this:

```
struct bitfields {
    long x : 2;
    char  : 0;
    long y : 15;
    int z : 20;
};
```

we get:

```
*** Dumping AST Record Layout
          0 | struct bitfields
      0:0-1 |   long x
        1:- |   char
     1:0-14 |   long y
     4:0-19 |   int z
            | [sizeof=8, align=8]

Layout: <CGRecordLayout
   LLVMType:%struct.bitfields = type <{ i8, i16, i8, i24 }>
   IsZeroInitializable:1
   BitFields:[
     <CGBitFieldInfo Offset:0 Size:2 IsSigned:1 StorageSize:8 
StorageOffset:0>
     <CGBitFieldInfo Offset:0 Size:15 IsSigned:1 StorageSize:16 
StorageOffset:1>
     <CGBitFieldInfo Offset:0 Size:20 IsSigned:1 StorageSize:32 
StorageOffset:4>
]>
```

I think I was put off track a lot by this; the various StorageSize info 
here, as well as the fancy layout string:

`{ i8, i16, i8, i24 }`

seem all to suggest that clang has a lot of granularity when it comes to 
bitfield allocation. I came to the conclusion that this is mostly a 
fiction - e.g. that is information at the IR-level, that's why we have 
odd things such as i24, which are not ABI types.

In fact, if we add another bitfield into the mix:

```
struct bitfields {
    long x : 2;
    char  : 0;
    long y : 15;
    int z : 20;
    int w : 13; //new!
};
```

one could imagine that only an extra i16 is added to the struct - 
following the above info. But that's not the case, the layout completely 
changes to this:

```
*** Dumping AST Record Layout
          0 | struct bitfields
      0:0-1 |   long x
        1:- |   char
     1:0-14 |   long y
     4:0-19 |   int z
     8:0-12 |   int w
            | [sizeof=16, align=8]

Layout: <CGRecordLayout
   LLVMType:%struct.bitfields = type <{ i8, i16, i8, i24, i16, [6 x i8] }>
   IsZeroInitializable:1
   BitFields:[
     <CGBitFieldInfo Offset:0 Size:2 IsSigned:1 StorageSize:8 
StorageOffset:0>
     <CGBitFieldInfo Offset:0 Size:15 IsSigned:1 StorageSize:16 
StorageOffset:1>
     <CGBitFieldInfo Offset:0 Size:20 IsSigned:1 StorageSize:32 
StorageOffset:4>
     <CGBitFieldInfo Offset:0 Size:13 IsSigned:1 StorageSize:16 
StorageOffset:8>
]>
```

That is, the new field has been allocated on a new 64-bit word, and the 
size of the struct doubled - despite clang says that the additional 
storage size is 16 bits.

In other words, all this info is probably at an higher level than what 
we need; under the hood clang is, at the end of the day, still mapping 
bitfields to old good ABI-sized containers.

So, what could be a good way to infer the storage unit size mentioned in 
the spec, and in a way that is not too fragile? Well, turns out that 
clang gives us a lot of offset information. For instance, in the second 
example above, by following the bitfields offset, we can put together 
the following layout:

`i2(get=x$get)(set=x$set)  x6  i15(get=y$get)(set=y$set)  x9 
i20(get=z$get)(set=z$set)  x12  i13(get=w$get)(set=w$set) x51`

That is, we know exactly where clang wants the padding, etc. The only 
thing we miss, is the ability to break this up into storage units, so 
that we can describe it using our layout API.

Here's the idea: let's sum up all the quantities in the above (padded) 
layout:

2 + 6 + 15 + 9 + 20 + 12 + 13 + 51 = 128

No surprises there, the total size is a multiple of 64 - so let's assume 
our storage unit is 64 (we need two of those), and break the above 
layout as follows:

`u64=[i2(get=x$get)(set=x$set) x6 i15(get=y$get)(set=y$set) x9 
i20(get=z$get)(set=z$set)x12] u64=[i13(get=w$get)(set=w$set) x51]`

This is essentially what jextract does: fiorst we compute the cumulative 
size of the bitfields (including the padding) and we use that to infer a 
storage unit size; Then we group all the bitfields using that storage size.

This approach seems to handle everything I could throw at it - that 
said, I might have missed something, so please, give this a try and let 
me know if this seems workable.

Webrev:

http://cr.openjdk.java.net/~mcimadamore/panama/jextract_bitfields/

Cheers
Maurizio





More information about the panama-dev mailing list