layout description - a proposal
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Mar 2 18:54:15 UTC 2018
Hi,
in the past few weeks I carried out some analysis to explore (i) what
other FFI supports are doing in terms of layout description [3] and also
(ii) what kind of layout descriptions are common in message protocols
[4]. After having collected all this data, I now feel more confident in
coming up with a proposal which should be simple, but still expressive
enough to capture more exotic use cases (as can be found in message
protocols).
Below, I'm going to describe the _requirements_ of the layout
description that is put forward by this proposal. As such, I shall make
no assumption on how such description might be surfaced in a potential
language. After all, the goal of this exercise is to capture the
semantics of a description - if, at some point we feel that such
description should be reified into a layout language, we can do so
accordingly, but I feel that should be the last step, not the first.
That said I'd also like to thank all the folks from IBM who contributed
to the current LDL effort (see [2]) - I think many of the conclusions
reached in that document are still valid - with a few twists that I'm
going to show below.
1) Scalars: we should only support three _kind_ of scalars: signed
integrals, unsigned integrals and floating points. It is important for
the description to distinguish between these three kinds, as a scalar
kind affects how a scalar value is treated (e.g. which CPU register
should be used for a load operation). Vector support should also be
considered in the future - as another possible scalar kind. The size of
a scalar should always be a multiple of 8, except for bitfields (see below).
2) Explicitness: the description should be as _explicit_ as possible.
That is, details such as (i) endianness, (ii) size of a scalar should
always be reified in the language description and not be subject to
platform-dependent considerations. (We will see later how
platform-dependent types such as C's 'int' could be implemented).
3) Addresses: in addition to scalars, we also need to have an explicit
description for layouts that are meant to represent memory
addresses/pointers. Such description could (optionally) reify info about
the layout of the memory region pointed to by the address.
4) Layouts can be combined into groups; two groups are supported:
product-like groups (aka structs) and sum-like groups (aka unions). A
group is made up of several element layouts.
5) Layout can be repeated (e.g. _array_ layouts); some support should be
provided for cases where the repetition count is not known statically
6) Layouts can be _annotated_ - that is, it should be possible to
associate key=value annotations to any layout. Of these, a special role
should be given to 'name' annotations, that could allow that layout to
be referenced from other layouts (see below).
7) Named layouts can be _referenced_ from other layouts. This is a
crucial property; once a layout has a name, another layout can refer to
it by name via an _unresolved layout_. An unresolved layout is simply
an annotated hole - where the contents of the hole will be replaced
dynamically with a suitable layout (which layout is replaced into the
hole depends on the annotations available in the unresolved layout).
This takes care of a bunch of use cases:
(a) express dependencies between multiple layouts w/o the need of
inlining one layout description into another (which could lead to cycles)
(b) have a way to refer to the layout of a struct field; if a native
struct whose name is S has a field layout whose name is f, the field
layout could be referenced using an unresolved layout which is annotated
with a layout expression like "S.f".
(c) allow for macro-like behavior; for instance, platform dependent
types such as 'int', 'long' could be modeled as references to an hidden
layout which contains a bunch of platform-dependent sub-layout definitions.
(d) could be used to represent intra message dependencies in message
protocols
8) Integral scalars can be broken down to bit fields - that is a scalar
can be associated with a _group overlay_, which define a substructure
that is to be associated with the said scalar. Fields in the overlay can
be named, and, as a result can be the target for replacements within
other unresolved layouts. Within the substructure of the overlay group,
scalar fields can have sub-byte sizes (the only place where this can
happen).
I think this covers the basics; as you can see, this proposal is
somewhere in between the Type Descriptor proposal [1] and the LDL
proposal [2]; it gives up ability to denote language dependent types
(such as int, and float), which is present in TD; at the same time, it
commits to an 'always explicit' policy, which is not the default in TD.
As such, it can be argued that a description in my proposed language is
very precise and also machine-dependent (which is the same choice LDL
does). But it also gives up some of the generality of the LDL proposal;
namely, the ability to reason about non-byte-aligned layouts (e.g. in
LDL you can say things such as 'b13', to denote a sequence of 13 bits);
this feature would be rarely used in practice, and in my analysis of
message protocols I did not find any need to model arbitrary bit layout
- it is very typical for message protocols packets to be byte-aligned,
to minimize encoding/decoding effort. Also, endianness is dealt with in
a much simpler way - that is, endianness is a property of a scalar,
while LDL has a much more general framework ('flip' operator) to model
endianness. As in LDL, it is possible for layouts to have annotations -
but I strongly feel that important representation distinctions should be
captured in the language rather than in another meta-language. In other
words, the more we make a description just about a 'bunch of bits', the
less said description has to say about the behavior that is attached to
such bits - meaning that this info will have to be recovered somehow, by
using extra metadata, or by other means. This is why I opted for
reifying information such as the scalar kind and the pointee information
in the description itself (while LDL delegates that job to a suitably
named annotation).
We plan to start working on the new API soon, and to replace the
existing internal layout API with a public, and (hopefully :-))
well-specified one. Once that's done, we'll also work to upgrade the
layout grammar and to make the necessary adjustments to jextract in
order to emit the right set of annotations/native descriptors. This work
might take place in an experimental branch first, as to minimize
disruptions.
[1] - http://cr.openjdk.java.net/~mcimadamore/panama/layout-grammar.txt
[2] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
[3] -
http://mail.openjdk.java.net/pipermail/panama-dev/2018-January/000915.html
[4] -
http://mail.openjdk.java.net/pipermail/panama-dev/2018-February/000940.html
Maurizio
More information about the panama-spec-experts
mailing list