From maurizio.cimadamore at oracle.com Fri Mar 2 18:54:15 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 2 Mar 2018 18:54:15 +0000 Subject: layout description - a proposal Message-ID: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com> Hi, in the past few weeks I carried out some analysis to explore (i) what other FFI supports are doing in terms of layout description [3] and also (ii) what kind of layout descriptions are common in message protocols [4]. After having collected all this data, I now feel more confident in coming up with a proposal which should be simple, but still expressive enough to capture more exotic use cases (as can be found in message protocols). Below, I'm going to describe the _requirements_ of the layout description that is put forward by this proposal. As such, I shall make no assumption on how such description might be surfaced in a potential language. After all, the goal of this exercise is to capture the semantics of a description - if, at some point we feel that such description should be reified into a layout language, we can do so accordingly, but I feel that should be the last step, not the first. That said I'd also like to thank all the folks from IBM who contributed to the current LDL effort (see [2]) - I think many of the conclusions reached in that document are still valid - with a few twists that I'm going to show below. 1) Scalars: we should only support three _kind_ of scalars: signed integrals, unsigned integrals and floating points. It is important for the description to distinguish between these three kinds, as a scalar kind affects how a scalar value is treated (e.g. which CPU register should be used for a load operation). Vector support should also be considered in the future - as another possible scalar kind. The size of a scalar should always be a multiple of 8, except for bitfields (see below). 2) Explicitness: the description should be as _explicit_ as possible. That is, details such as (i) endianness, (ii) size of a scalar should always be reified in the language description and not be subject to platform-dependent considerations. (We will see later how platform-dependent types such as C's 'int' could be implemented). 3) Addresses: in addition to scalars, we also need to have an explicit description for layouts that are meant to represent memory addresses/pointers. Such description could (optionally) reify info about the layout of the memory region pointed to by the address. 4) Layouts can be combined into groups; two groups are supported: product-like groups (aka structs) and sum-like groups (aka unions). A group is made up of several element layouts. 5) Layout can be repeated (e.g. _array_ layouts); some support should be provided for cases where the repetition count is not known statically 6) Layouts can be _annotated_ - that is, it should be possible to associate key=value annotations to any layout. Of these, a special role should be given to 'name' annotations, that could allow that layout to be referenced from other layouts (see below). 7) Named layouts can be _referenced_ from other layouts. This is a crucial property; once a layout has a name, another layout can refer to it by name via an? _unresolved layout_. An unresolved layout is simply an annotated hole - where the contents of the hole will be replaced dynamically with a suitable layout (which layout is replaced into the hole depends on the annotations available in the unresolved layout). This takes care of a bunch of use cases: ?? (a) express dependencies between multiple layouts w/o the need of inlining one layout description into another (which could lead to cycles) ?? (b) have a way to refer to the layout of a struct field; if a native struct whose name is S has a field layout whose name is f, the field layout could be referenced using an unresolved layout which is annotated with a layout expression like "S.f". ?? (c) allow for macro-like behavior; for instance, platform dependent types such as 'int', 'long' could be modeled as references to an hidden layout which contains a bunch of platform-dependent sub-layout definitions. ?? (d) could be used to represent intra message dependencies in message protocols 8) Integral scalars can be broken down to bit fields - that is a scalar can be associated with a _group overlay_, which define a substructure that is to be associated with the said scalar. Fields in the overlay can be named, and, as a result can be the target for replacements within other unresolved layouts. Within the substructure of the overlay group, scalar fields can have sub-byte sizes (the only place where this can happen). I think this covers the basics; as you can see, this proposal is somewhere in between the Type Descriptor proposal [1] and the LDL proposal [2]; it gives up ability to denote language dependent types (such as int, and float), which is present in TD; at the same time, it commits to an 'always explicit' policy, which is not the default in TD. As such, it can be argued that a description in my proposed language is very precise and also machine-dependent (which is the same choice LDL does). But it also gives up some of the generality of the LDL proposal; namely, the ability to reason about non-byte-aligned layouts (e.g. in LDL you can say things such as 'b13', to denote a sequence of 13 bits); this feature would be rarely used in practice, and in my analysis of message protocols I did not find any need to model arbitrary bit layout - it is very typical for message protocols packets to be byte-aligned, to minimize encoding/decoding effort. Also, endianness is dealt with in a much simpler way - that is, endianness is a property of a scalar, while LDL has a much more general framework ('flip' operator) to model endianness. As in LDL, it is possible for layouts to have annotations - but I strongly feel that important representation distinctions should be captured in the language rather than in another meta-language. In other words, the more we make a description just about a 'bunch of bits', the less said description has to say about the behavior that is attached to such bits - meaning that this info will have to be recovered somehow, by using extra metadata, or by other means. This is why I opted for reifying information such as the scalar kind and the pointee information in the description itself (while LDL delegates that job to a suitably named annotation). We plan to start working on the new API soon, and to replace the existing internal layout API with a public, and (hopefully :-)) well-specified one. Once that's done, we'll also work to upgrade the layout grammar and to make the necessary adjustments to jextract in order to emit the right set of annotations/native descriptors. This work might take place in an experimental branch first, as to minimize disruptions. [1] - http://cr.openjdk.java.net/~mcimadamore/panama/layout-grammar.txt [2] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html [3] - http://mail.openjdk.java.net/pipermail/panama-dev/2018-January/000915.html [4] - http://mail.openjdk.java.net/pipermail/panama-dev/2018-February/000940.html Maurizio From Tobi_Ajila at ca.ibm.com Tue Mar 13 18:01:45 2018 From: Tobi_Ajila at ca.ibm.com (Tobi Ajila) Date: Tue, 13 Mar 2018 13:01:45 -0500 Subject: layout description - a proposal In-Reply-To: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com> References: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com> Message-ID: > As in LDL, it is possible for layouts to have annotations - > but I strongly feel that important representation distinctions should be > captured in the language rather than in another meta-language. When we previously investigated this we saw a distinction between layout (where are the bits - explicit size and offset), representation (what do the bits mean - endianness, signedness) and interaction (how are the bits loaded/stored - volatility, alignment constraints, etc.). My understanding of this proposal is that 'layout' and 'representation' are first class citizens. Is the intention to use annotations to convey 'interaction' properties? --Tobi From maurizio.cimadamore at oracle.com Tue Mar 13 21:15:51 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Tue, 13 Mar 2018 21:15:51 +0000 Subject: layout description - a proposal In-Reply-To: References: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com> Message-ID: <09b056e1-19a6-0171-d3e5-95a7df70dfe0@oracle.com> Hi Tobi, I think your reading is correct - the intention here is to make properties that would potentially result in breakage if violated (and mistaking an int for a floating point seems like an unquestionably bad thing) as first class entities of the description. Moreover, the goal is also to be able to use the layout description to drive the ABI mapping. So, knowing signedness and FP-ness is an important bit in order to understand e.g. which register could be used to pass a value with that layout on to a native function. So, the choices available should be sufficiently expressive to cover e.g. the type table in section 3 in the System V ABI [1]. If I look at that table, I see that it makes distinction between signed/unsigned, it makes explicit assumptions on size (e.g. fourbyte, eightbyte), and it distinguish between integral and FP values. These basic distinctions form the base of the properties I wanted to capture in the layout description. Maurizio [1] - https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf On 13/03/18 18:01, Tobi Ajila wrote: > > > As in LDL, it is possible for layouts to have annotations - > > but I strongly feel that important representation distinctions > should be > > captured in the language rather than in another meta-language. > When we previously investigated this we saw a distinction between > layout (where are the bits - explicit size and offset), representation > (what > do the bits mean - endianness, signedness) and interaction (how are the > bits loaded/stored - volatility, alignment constraints, etc.). > > My understanding of this proposal is that 'layout' and > 'representation' are > first class citizens. Is the intention to use annotations to convey > 'interaction' properties? > > --Tobi >