bitfields support in jextract
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jun 29 17:36:08 UTC 2018
Hi,
I'm sending this internally as an headsup...
I think that, after much banging head against walls, I might have found
a relatively simple approach to handle bitfields in jextract. Let's
recap some of the issues:
* the clang API exposes 'real' bitfield offsets
* the clang API exposes bitfield width (as per source code)
* the clang API does NOT expose physical mapping of bitfields
The last point is crucial - bitfields access can always be thought of
accessing a container, and then bitmasking to get the bitfield value.
The C specification talks about storage unit:
```
An implementation may allocate any addressable storage unit large enough
to hold a bit-
field. If enough space remains, a bit-field that immediately follows
another bit-field in a
structure shall be packed into adjacent bits of the same unit. If
insufficient space remains,
whether a bit-field that does not fit is put into the next
unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a
unit (high-order to
low-order or low-order to high-order) is implementation-defined.
The alignment of the
addressable storage unit is unspecified
```
In other words, the implementation needs to pick a suitable storage
unit, and should keep using it until it can - when it can't it should
just make new one and go on from there.
Note that this is all impl specific, the C spec says nothing about how
to derive a storage unit from the bitfields declaration.
Now, clang *does* expose some low level physical info in its dumps -
e.g. for a struct like this:
```
struct bitfields {
long x : 2;
char : 0;
long y : 15;
int z : 20;
};
```
we get:
```
*** Dumping AST Record Layout
0 | struct bitfields
0:0-1 | long x
1:- | char
1:0-14 | long y
4:0-19 | int z
| [sizeof=8, align=8]
Layout: <CGRecordLayout
LLVMType:%struct.bitfields = type <{ i8, i16, i8, i24 }>
IsZeroInitializable:1
BitFields:[
<CGBitFieldInfo Offset:0 Size:2 IsSigned:1 StorageSize:8
StorageOffset:0>
<CGBitFieldInfo Offset:0 Size:15 IsSigned:1 StorageSize:16
StorageOffset:1>
<CGBitFieldInfo Offset:0 Size:20 IsSigned:1 StorageSize:32
StorageOffset:4>
]>
```
I think I was put off track a lot by this; the various StorageSize info
here, as well as the fancy layout string:
`{ i8, i16, i8, i24 }`
seem all to suggest that clang has a lot of granularity when it comes to
bitfield allocation. I came to the conclusion that this is mostly a
fiction - e.g. that is information at the IR-level, that's why we have
odd things such as i24, which are not ABI types.
In fact, if we add another bitfield into the mix:
```
struct bitfields {
long x : 2;
char : 0;
long y : 15;
int z : 20;
int w : 13; //new!
};
```
one could imagine that only an extra i16 is added to the struct -
following the above info. But that's not the case, the layout completely
changes to this:
```
*** Dumping AST Record Layout
0 | struct bitfields
0:0-1 | long x
1:- | char
1:0-14 | long y
4:0-19 | int z
8:0-12 | int w
| [sizeof=16, align=8]
Layout: <CGRecordLayout
LLVMType:%struct.bitfields = type <{ i8, i16, i8, i24, i16, [6 x i8] }>
IsZeroInitializable:1
BitFields:[
<CGBitFieldInfo Offset:0 Size:2 IsSigned:1 StorageSize:8
StorageOffset:0>
<CGBitFieldInfo Offset:0 Size:15 IsSigned:1 StorageSize:16
StorageOffset:1>
<CGBitFieldInfo Offset:0 Size:20 IsSigned:1 StorageSize:32
StorageOffset:4>
<CGBitFieldInfo Offset:0 Size:13 IsSigned:1 StorageSize:16
StorageOffset:8>
]>
```
That is, the new field has been allocated on a new 64-bit word, and the
size of the struct doubled - despite clang says that the additional
storage size is 16 bits.
In other words, all this info is probably at an higher level than what
we need; under the hood clang is, at the end of the day, still mapping
bitfields to old good ABI-sized containers.
So, what could be a good way to infer the storage unit size mentioned in
the spec, and in a way that is not too fragile? Well, turns out that
clang gives us a lot of offset information. For instance, in the second
example above, by following the bitfields offset, we can put together
the following layout:
`i2(get=x$get)(set=x$set) x6 i15(get=y$get)(set=y$set) x9
i20(get=z$get)(set=z$set) x12 i13(get=w$get)(set=w$set) x51`
That is, we know exactly where clang wants the padding, etc. The only
thing we miss, is the ability to break this up into storage units, so
that we can describe it using our layout API.
Here's the idea: let's sum up all the quantities in the above (padded)
layout:
2 + 6 + 15 + 9 + 20 + 12 + 13 + 51 = 128
No surprises there, the total size is a multiple of 64 - so let's assume
our storage unit is 64 (we need two of those), and break the above
layout as follows:
`u64=[i2(get=x$get)(set=x$set) x6 i15(get=y$get)(set=y$set) x9
i20(get=z$get)(set=z$set)x12] u64=[i13(get=w$get)(set=w$set) x51]`
This is essentially what jextract does: fiorst we compute the cumulative
size of the bitfields (including the padding) and we use that to infer a
storage unit size; Then we group all the bitfields using that storage size.
This approach seems to handle everything I could throw at it - that
said, I might have missed something, so please, give this a try and let
me know if this seems workable.
Webrev:
http://cr.openjdk.java.net/~mcimadamore/panama/jextract_bitfields/
Cheers
Maurizio
More information about the panama-dev
mailing list