From maurizio.cimadamore at oracle.com  Fri Mar  2 18:54:15 2018
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Fri, 2 Mar 2018 18:54:15 +0000
Subject: layout description - a proposal
Message-ID: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com>

Hi,
in the past few weeks I carried out some analysis to explore (i) what 
other FFI supports are doing in terms of layout description [3] and also 
(ii) what kind of layout descriptions are common in message protocols 
[4]. After having collected all this data, I now feel more confident in 
coming up with a proposal which should be simple, but still expressive 
enough to capture more exotic use cases (as can be found in message 
protocols).

Below, I'm going to describe the _requirements_ of the layout 
description that is put forward by this proposal. As such, I shall make 
no assumption on how such description might be surfaced in a potential 
language. After all, the goal of this exercise is to capture the 
semantics of a description - if, at some point we feel that such 
description should be reified into a layout language, we can do so 
accordingly, but I feel that should be the last step, not the first. 
That said I'd also like to thank all the folks from IBM who contributed 
to the current LDL effort (see [2]) - I think many of the conclusions 
reached in that document are still valid - with a few twists that I'm 
going to show below.

1) Scalars: we should only support three _kind_ of scalars: signed 
integrals, unsigned integrals and floating points. It is important for 
the description to distinguish between these three kinds, as a scalar 
kind affects how a scalar value is treated (e.g. which CPU register 
should be used for a load operation). Vector support should also be 
considered in the future - as another possible scalar kind. The size of 
a scalar should always be a multiple of 8, except for bitfields (see below).

2) Explicitness: the description should be as _explicit_ as possible. 
That is, details such as (i) endianness, (ii) size of a scalar should 
always be reified in the language description and not be subject to 
platform-dependent considerations. (We will see later how 
platform-dependent types such as C's 'int' could be implemented).

3) Addresses: in addition to scalars, we also need to have an explicit 
description for layouts that are meant to represent memory 
addresses/pointers. Such description could (optionally) reify info about 
the layout of the memory region pointed to by the address.

4) Layouts can be combined into groups; two groups are supported: 
product-like groups (aka structs) and sum-like groups (aka unions). A 
group is made up of several element layouts.

5) Layout can be repeated (e.g. _array_ layouts); some support should be 
provided for cases where the repetition count is not known statically

6) Layouts can be _annotated_ - that is, it should be possible to 
associate key=value annotations to any layout. Of these, a special role 
should be given to 'name' annotations, that could allow that layout to 
be referenced from other layouts (see below).

7) Named layouts can be _referenced_ from other layouts. This is a 
crucial property; once a layout has a name, another layout can refer to 
it by name via an? _unresolved layout_. An unresolved layout is simply 
an annotated hole - where the contents of the hole will be replaced 
dynamically with a suitable layout (which layout is replaced into the 
hole depends on the annotations available in the unresolved layout). 
This takes care of a bunch of use cases:

 ?? (a) express dependencies between multiple layouts w/o the need of 
inlining one layout description into another (which could lead to cycles)
 ?? (b) have a way to refer to the layout of a struct field; if a native 
struct whose name is S has a field layout whose name is f, the field 
layout could be referenced using an unresolved layout which is annotated 
with a layout expression like "S.f".
 ?? (c) allow for macro-like behavior; for instance, platform dependent 
types such as 'int', 'long' could be modeled as references to an hidden 
layout which contains a bunch of platform-dependent sub-layout definitions.
 ?? (d) could be used to represent intra message dependencies in message 
protocols

8) Integral scalars can be broken down to bit fields - that is a scalar 
can be associated with a _group overlay_, which define a substructure 
that is to be associated with the said scalar. Fields in the overlay can 
be named, and, as a result can be the target for replacements within 
other unresolved layouts. Within the substructure of the overlay group, 
scalar fields can have sub-byte sizes (the only place where this can 
happen).


I think this covers the basics; as you can see, this proposal is 
somewhere in between the Type Descriptor proposal [1] and the LDL 
proposal [2]; it gives up ability to denote language dependent types 
(such as int, and float), which is present in TD; at the same time, it 
commits to an 'always explicit' policy, which is not the default in TD. 
As such, it can be argued that a description in my proposed language is 
very precise and also machine-dependent (which is the same choice LDL 
does). But it also gives up some of the generality of the LDL proposal; 
namely, the ability to reason about non-byte-aligned layouts (e.g. in 
LDL you can say things such as 'b13', to denote a sequence of 13 bits); 
this feature would be rarely used in practice, and in my analysis of 
message protocols I did not find any need to model arbitrary bit layout 
- it is very typical for message protocols packets to be byte-aligned, 
to minimize encoding/decoding effort. Also, endianness is dealt with in 
a much simpler way - that is, endianness is a property of a scalar, 
while LDL has a much more general framework ('flip' operator) to model 
endianness. As in LDL, it is possible for layouts to have annotations - 
but I strongly feel that important representation distinctions should be 
captured in the language rather than in another meta-language. In other 
words, the more we make a description just about a 'bunch of bits', the 
less said description has to say about the behavior that is attached to 
such bits - meaning that this info will have to be recovered somehow, by 
using extra metadata, or by other means. This is why I opted for 
reifying information such as the scalar kind and the pointee information 
in the description itself (while LDL delegates that job to a suitably 
named annotation).


We plan to start working on the new API soon, and to replace the 
existing internal layout API with a public, and (hopefully :-)) 
well-specified one. Once that's done, we'll also work to upgrade the 
layout grammar and to make the necessary adjustments to jextract in 
order to emit the right set of annotations/native descriptors. This work 
might take place in an experimental branch first, as to minimize 
disruptions.

[1] - http://cr.openjdk.java.net/~mcimadamore/panama/layout-grammar.txt
[2] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
[3] - 
http://mail.openjdk.java.net/pipermail/panama-dev/2018-January/000915.html
[4] - 
http://mail.openjdk.java.net/pipermail/panama-dev/2018-February/000940.html

Maurizio


From Tobi_Ajila at ca.ibm.com  Tue Mar 13 18:01:45 2018
From: Tobi_Ajila at ca.ibm.com (Tobi Ajila)
Date: Tue, 13 Mar 2018 13:01:45 -0500
Subject: layout description - a proposal
In-Reply-To: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com>
References: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com>
Message-ID: <OF6CFBA510.A4FBEA4C-ON0025824F.0062F59A-8525824F.006309E2@notes.na.collabserv.com>

> As in LDL, it is possible for layouts to have annotations -
> but I strongly feel that important representation distinctions should be
> captured in the language rather than in another meta-language.
When we previously investigated this we saw a distinction between
layout (where are the bits - explicit size and offset), representation
(what
do the bits mean - endianness, signedness) and interaction (how are the
bits loaded/stored - volatility, alignment constraints, etc.).

My understanding of this proposal is that 'layout' and 'representation' are
first class citizens. Is the intention to use annotations to convey
'interaction' properties?

--Tobi

From maurizio.cimadamore at oracle.com  Tue Mar 13 21:15:51 2018
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Tue, 13 Mar 2018 21:15:51 +0000
Subject: layout description - a proposal
In-Reply-To: <OF6CFBA510.A4FBEA4C-ON0025824F.0062F59A-8525824F.006309E2@notes.na.collabserv.com>
References: <24ef677d-5462-53fb-c143-3f0506721ee6@oracle.com>
 <OF6CFBA510.A4FBEA4C-ON0025824F.0062F59A-8525824F.006309E2@notes.na.collabserv.com>
Message-ID: <09b056e1-19a6-0171-d3e5-95a7df70dfe0@oracle.com>

Hi Tobi,
I think your reading is correct - the intention here is to make 
properties that would potentially result in breakage if violated (and 
mistaking an int for a floating point seems like an unquestionably bad 
thing) as first class entities of the description. Moreover, the goal is 
also to be able to use the layout description to drive the ABI mapping. 
So, knowing signedness and FP-ness is an important bit in order to 
understand e.g. which register could be used to pass a value with that 
layout on to a native function. So, the choices available should be 
sufficiently expressive to cover e.g. the type table in section 3 in the 
System V ABI [1].

If I look at that table, I see that it makes distinction between 
signed/unsigned, it makes explicit assumptions on size (e.g. fourbyte, 
eightbyte), and it distinguish between integral and FP values. These 
basic distinctions form the base of the properties I wanted to capture 
in the layout description.

Maurizio

[1] - 
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf


On 13/03/18 18:01, Tobi Ajila wrote:
>
> > As in LDL, it is possible for layouts to have annotations -
> > but I strongly feel that important representation distinctions 
> should be
> > captured in the language rather than in another meta-language.
> When we previously investigated this we saw a distinction between
> layout (where are the bits - explicit size and offset), representation 
> (what
> do the bits mean - endianness, signedness) and interaction (how are the
> bits loaded/stored - volatility, alignment constraints, etc.).
>
> My understanding of this proposal is that 'layout' and 
> 'representation' are
> first class citizens. Is the intention to use annotations to convey
> 'interaction' properties?
>
> --Tobi
>