minimal layout definition language

John Rose john.r.rose at oracle.com
Tue Jul 26 08:17:05 UTC 2016


Here is an LDL with simple but usable syntax and clearly defined semantics.  (See links and file copy after the signature.)

I hope it is a significant step toward creating an intermediate language for data structure layouts that expresses a naturally wide variety of possibilities, in a language-neutral and processor-neutral way.

I have credited Tobi Ajila and Henry Jen as authors, because I have relied heavily on their work for ideas and inspiration.  However, if you find a goofy idea somewhere in the document, please assume I put it in.

— John

http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.pdf
http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.md

<h1>
A minimal layout definition language, one step at a time
</h1>
  > John Rose, Tobi Ajila, Henry Jen, 2016-0525 _(0.2)_
# 

A layout description is a string, which can then be processed into a
layout type, which can then be instantiated over a suitable memory
region as a layout instance.  Layouts can also apply to the
bit-oriented contents of registers.  The actual bits in memory or
register are called the _payload_.

This note describes a concise syntax for layout descriptions, loosely
based on C's `printf`, Python's `unpack`, and similar notations.

  0. The empty layout description string `""` denotes zero bits of
payload, which can be located anywhere or nowhere.  It is possible to
treat such a layout as describing a logical variable, but this
variable is a _unit_ with only one value, which requires no storage.

    ~~~~
    layout = "" | ...
    ~~~~

    0a. Whitespace of any sort may be included anywhere in a layout
description string (except within annotations, described later).  The
whitespace does not affect meaning in any way.  Comments may started
with `"#"` and ended with a newline or the end of the layout
description, and are treated the same as whitespace (outside of
annotations).  Thus the string `"\t #ho\n #hum"` denotes the same
layout as the empty string.

    ~~~~
    whitespace = ONE_OF[" \t\n"] | "#" ( NOT_ONE_OF["\n"] )*
    ~~~~

  1. Non-empty layouts describe one or more _bits_.  The simplest
non-empty layout description is the single letter `"b"`, which denotes
one bit of payload, with no constraint on its location.

    ~~~~
    element = atom | ...
    atom = bit | ...
    bit = "b"
    ~~~~

  2. Two layout descriptions can be _concatenated_, so that `"bb"`
denotes two bits of payload.  The payload of a concatenation of
layouts is the bitwise concatenation of the component layouts, with no
additional padding.  In a layout description, concatenation is
associative but not commutative, so the ordering of bits is
significant.  Thus, the text of a composite layout description can
contain one or more component layout descriptions.  We can refer to
those syntactic components as subexpressions or simply elements.

    ~~~~
    layout = ( element )*
    ~~~~

    2a. When a layout is applied to a register (or any other bit
vector), the arithmetically least significant bit is first, and
numbering proceeds upwards.  In memory (or any array of multi-bit
units), the ordering of bits starts at low addresses, filling one
memory unit (in arithmetic order) and proceeding to the following
memory unit, at the next higher address.  For byte-oriented memory,
this implies a little-endian order byte order when moving between
registers and memory.  Byte swapping will come in later.

  3. Layouts can be explicitly _grouped_ via square brackets, so that
`"[bb]"` describes two bits of payload; in this case, the two bits are
logically combined into a single subexpression or element.  Elements
can be nested, but such nesting does not all by itself confer any
particular significance to users.

    ~~~~
    element = atom | group | ...
    group = "[" ( group_body )? "]"
    group_body = element ( element )+ | ...
    ~~~~

    3a. Trivially, individual bits are also regarded as elements,
though they are usually _insignificant_ as individual variables or
values.  Grouped bits are typically used to denote bytes and larger
units of information.

    3b. Any element of a layout description (from single bits up to
the whole expression string) is unambiguously assigned a size (in
bits) which is (usually) derived by summing the sizes of the component
elements.  Likewise, each element has a position in the whole layout
which is (usually) derived by summing the sizes of the preceding
elements.  The empty group `"[]"` is a valid group; it has zero size
and no alignment restrictions.

    3c. A group expression with exactly one non-group element in it,
such as `"[b]"`, functions as a parenthesis, and so it denotes exactly
the same layout as the single sub-element.  The brackets of a
parenthesis are ignored except for their help disambiguating the
syntax of the layout language.  A semantic group with exactly one
element can be expressed by ensuring that the sole element of the
group also has brackets, so that `"[[]]"`, `"[[b]]"`, and `"[[bb]]"`
are all actual one-element groups, whose elements are respectively
`"[]"`, `"b"` (from the parenthesis `"[b]"`), and `"[bb]"`.

    ~~~~
    atom = bit | parenthesis | ...
    parenthesis = "[" element BUT_NOT[group] "]"
    group_body = element ( element )+ | singleton | ...
    singleton = group | parenthesis
    ~~~~

  4. Layouts can be _replicated_, so that `"8b"` denotes eight bits.
A layout element prefixed by a non-negative decimal numeral is
equivalent to its _expansion_, a grouped layout with the indicated
number of copies of the described element.  Therefore `"8b"` and
`"[bbbb bbbb]"` describe the same layout type.

    ~~~~
    element = atom | group | prefixed_element | ...
    prefixed_element = replication | ...
    replication = count_prefix element | "[" count_prefix element "]"
    count_prefix = count | ...
    count = numeral | ...
    numeral = ONE_OF["1-9"] ( ONE_OF["0-9"] )*
    ~~~~

    4a. Nested replication is possible, but the prefixes must be kept
unambiguously separate, e.g., by brackets, as `"2[2b]"`, a two-by-two
bit matrix.  (Whitespace won't help with this since `"2 2b"` and
`"22b"` are the same layout of twenty-two bits.)  If a replication is
immediately surrounded by explicit grouping brackets, those brackets
are taken to be the same as the brackets included in the expansion.
Thus, `"2[2b]"` expands to `"2[bb]"` and then `"[[bb][bb]]"`, not
`"2[[bb]]"` or `"[[[bb]][[bb]]]"`.  But `"[2bb]"` does expand to
to `"[[bb]b]"` not `"[bbb]"`, because the `"2b"` is not immediately
surrounded by brackets.

    4b. The replication rule implies that prefixing an element by the
numeral 1 is equivalent to putting it in brackets, and prefixing any
element by the numeral 0 is equivalent to replacing it by empty
brackets.  Thus `"1b"` and `"[1b]"` expand to `"[b]"` not just `"b"`,
and `"0b"` and `"[0b]"` expand to `"[]"`.

    4c. (There will be extra twists on replication to support reverse
replication and variable-length replication.)

  5. Elements can be accompanied by _alignment_ constraints.  The
constraint is formulated in bits as a power of two bits, and expressed
as a decimal number prefix which modifies an element.  Thus a normal
memory byte is `"8%8b"`.  If the alignment is the same as the size of
the element, the numeral may be omitted, so `"%8b"` denotes the same
layout as `"8%8b"`, which in turn denotes `"8%[bbbb bbbb]"`.  (As a
matter of syntax, prefix operators like `%` which have optional
numeric prefixes associate with them if present, instead of allowing
the prefix to denote replication.)

    ~~~~
    prefixed_element = replication | alignment | ...
    alignment = ( count )? "%" element
    ~~~~

    5a. Alignment constraints are merely checked; they do not cause
invisible padding bits to be inserted into a layout.  An alignment
constraint on an element, if supplied, overrides all alignment
constraints inside the element, so that a byte can be laid out as a
bitfield (`"1%[%8b]"`), and a word can be laid out as a sequence of
unaligned bytes: (`"8%[%32b]"`).

    5b. Thus, the prefix `"8%[]"`, before any layout, has the effect
of declaring that all following elements will be byte-aligned,
regardless of individual alignment requirements.  This is useful for
declaring that a layout applies to native memory access on a CPU which
allows unaligned accesses, or to the formatting of an unaligned
network packet.

  6. Certain other lower-case letters serve as _abbreviations_ for
standard elements.  There is such an abbreviation for octet, halfword,
word, doubleword, and quadword.  (The use of "word" here follows a
32-bit convention, which differs from the older convention of the x86
ISA, where a "word" is 16 bits.)  Specifically, the layout description
`"ohwdq"` denotes the same layout as the more verbose `"%8b %2o %4o
%8o %16o"`.  (Those particular constraints must all satisfied
simultaneously; this is tricky but possible.)

    ~~~~
    atom = bit | parenthesis | abbreviation | ...
    abbreviation = ONE_OF["ohwdq"]
    ~~~~

    6a. Data types of other sizes can be easily specified either in
terms of bits or in terms of the standard abbreviations.  Like all
layouts, abbreviations can be modified by operators, so a byte-aligned
32-bit word can be denoted by `"8%w"`, and a single-word aligned
double-word as `"32%d"`.  But the whole thing can be stuffed in a byte
array, unaligned, as `"8%[32%d]"`.

    6b. These abbreviations denote size and alignment, but do not
imply any particular format or interpretation of the payload bits.
(Such information can be added as needed, a point we will return to.)

  7. Layout descriptions can be overlapped, using a _reset operator_
spelled with a vertical bar, as in `"[o|w]"`.  The bar divides the
bracketed layout into two or more _alternatives_, each of which is an
independent layout.  The alternatives overlap, with all of them
starting at the same bit position, called the _local origin_ of the
bracketed layout.  A bracketed layout with no reset operator (like
`"[q]"`), is also said to consist of a single alternative (`"q"`).

    ~~~~
    group_body = element ( element )+ | singleton | alternatives
    alternatives = ( alternative )+ ( element )*
    alternative = sized_alt | ...
    sized_alt = ( element )+ "|"
    ~~~~

    7a. In effect, the reset operator stops the concatenation process,
and restarts it at the position corresponding to the innermost
unclosed left bracket (or the beginning of the layout description
string, if there is none).  Thus, `"[o|w]"` denotes an octet (byte)
and a (32-bit) word, overlapping on the lowest-numbered eight bits of
the word.  Now we can see that each component element in a grouped
layout is placed at a current position (starting with the local
origin) and the current position is incremented by the size of the
component, to make ready for placing the next component.

    7b. The size of a grouped layout with two or more alternatives is
defined to be the maximum size spanned by the bit positions reached by
each alternative, except for any unsized alternatives (described
next).  (As we shall see, a grouped layout can grow from its origin in
either direction; both directions count, though the positive direction
is most common.)

    7c. An alternative may denote an empty layout, such as `"[]"`, but
it may not be the empty layout description, unless it is the last or
only alternative in its group.  Therefore, `"[|]"` and `"[||]"` are
illegal syntax.  However, the reset operator after an alternative may
be doubled, which marks that alternative as _unsized_.  Thus, the
sizes of `"[3b||2b]"` and `"[2b|3b||]"` are two bits, because the
three-bit alternatives, marked with a double bar, are unsized.  A
_lookahead layout_ of zero size can be expressed using a single
unsized alternative, such as `"[d||]"`.  (We may ignore the second
alternative, an empty layout.)  That layout has a size of zero, an
alignment of 64 bits, and contains a 64-bit payload which extends just
after the layout's local origin.

    ~~~~
    alternative = sized_alt | unsized_alt
    unsized_alt = ( element )+ "||"
    ~~~~

  8. A layout element may be concatenated in _reverse_, using a
reversal operator spelled with a minus sign, as in `"[w-o]"`.  The
effect of the minus sign is to decrement the current position within
the current (innermost) group, by the size of the next component, and
to position that component at the that decremented position.  We can
see that normal concatenation positions elements using
post-incremented offsets, while reverse concatenation positions them
with pre-decremented offsets.  The current position can become
negative, moving to the left of the local origin.

    ~~~~
    prefixed_element = replication | alignment | reversal | ...
    reversal = "-" element
    ~~~~

    8a. In effect, a minus sign concatenates the following element so
that the element _ends_ at the current position, and the current
position is then updated to be the _beginning_ of the concatenated
element.  Thus, `"[-d||]"` denotes a _lookbehind layout_ of zero size,
which contains a 64-bit aligned payload which extends just before the
layout's local origin.

    8b. The size of a sequence of elements in an alternative, whether
concatenated in normal or reversed order, or a mix of both, is the
difference between the highest positive offset and the lowest negative
offset reached after any element, all relative to the local origin.
Thus, `"[w-o]"`, `"[o-w]"`, `"[-w]"`, and `"[w-w]"` are all 32 bits in
size, like `"[w]"` itself.  Meanwhile, `"[o|-w]"`, `"[-o|w]"`, and
`"[-o-w]"` are all 40 bits in size, like `"[ow]"` and `"[wo]"`.

    8c. The component being concatenated retains its own internal
numbering of bits, so that `"[d-w-h-h]"` consists of a 64-bit
doubleword overlaid with a 32-bit word and two 16-bit halfwords.  The
textually last halfword corresponds to the least significant bits in
the doubleword, while the 32-bit word corresponds to its most
significant half.  But the internal numbering of bits inside all four
components is consistent; there is no bit reversal inside the word or
either halfword.

  9. The reversal operator and replication operators can be _combined_
to denote replicated reverse concatenation.  The minus sign must come
between the replication prefix and the element it replicates.  Thus,
`"4-b"` is shorthand for `"[-b-b-b-b]"`, a field four bits in size, in
which the bits are individually placed from right to left.  The
expression `"%4-o"`, shorthand for `"%[-o-o-o-o]"`, denotes a 32-bit
aligned element whose textually first byte is placed in the third byte
of the element, and so on.  If we can reassemble these bytes into a
bit-vector, in their textual order, we will have loaded a big-endian
word from memory; we will do this shortly.

    ~~~~
    count_prefix = count ( "-" )? | ...
    ~~~~

    9a. (Design alternative: We could make the reversal operator
"sticky", so "[d-whh]"` is shorthand for "[d-w-h-h]"`, and `"%4-o"` is
shorthand for `"%[-oooo]"` as well as `"%[-o-o-o-o]"`.  The sticky
mode would cease at end of the current alternative, or until an
_unreversal operator_, to be spelled `"+"` or `"--"`, maybe.  However,
since reversal, though necessary, is tricky to reason about, it seems
better to make it as explicit as possible.)

  10. Layout descriptions may include _padding_, which is expressed as
if it were payload bits or bytes, but preceded by the prefix `"x"`.
Thus, padding has size and alignment just other layout elements, but
is treated as insignificant to users.  Padding can be followed by
reverse concatenation in order to express right-justified data
embedded in a fixed word.  For example, `"[xw -b -2b -3b]"`, denotes a
series of C-like bitfields that starts at the left end of a containing
word.  The one-bit field is the most significant bit (bit 31),
followed by the two bit-field, and then the three-bit field.

    ~~~~
    prefixed_element = replication | alignment | reversal |
            padding | ...
    padding = "x" element
    ~~~~

  11. Any layout element, and any replication count, may be followed
by any number of _annotations_.  Each annotation is a sequence of
characters between matching parentheses.  An annotation begins with an
open parenthesis and a name, and ends with a close parenthesis.  After
the name may occur an equals sign and an arbitrary string.  If the
equals sign is missing, the prefix `"n="` is supplied, so `"b(bitty)"`
and `"b(n=bitty)"` are the same denotation for an annotated bit.

    ~~~~
    element = atom | group | prefixed_element | annotated_element ...
    annotated_element = element ( annotation )+
    annotation = name_only_annotation
            | "(" annotation_name "=" annotation_value ")"
    name_only_annotation = "(" annotation_name ")"
    annotation_name = ( ONE_OF[JAVA_IDENTIFIER_PART] )+
    annotation_value = ( NOT_ONE_OF["()"] | "(" annotation_value ")" )*
    count = numeral ( annotation )*
    ~~~~

    11a. The annotation name consists of one or more alphanumeric
unicode characters.  The annotation string, if present, may contain
parentheses only if they are internally matched.

    11b. Within the parentheses of an annotation, whitespace is
significant, and the layout comment convention is suppressed.
Unmatched parentheses must be expressed indirectly using other
characters, if the user needs them.  (A suggested convention is to
rewrite the number sign followed by curly brackets or hyphen by round
brackets and the number sign itself.  So `"#{"` and `"#}"` and `"#-"`
become `"("` and `")"` and `"#"`, but no other occurrences of number
sign are changed.)

    11d. Syntactically, annotations associate more loosely than
replications and other prefix operators.  This means that `"2w(S)"`
denotes the same layout as `"[ww](S)"`, not `"[w(S)w(S)]"`.  To
get the latter layout, use parenthesis: `"2[w(S)]"`.  Parenthesis
can also emphasize the former layout: `"[2w](S)"`.

  12. If an annotation name is a single ASCII character, or a decimal
numeral, its meaning is _reserved_ for possible definition by the
layout language.  Other names will never have significance directly
assigned by the basic layout language.

    ~~~~
    reserved_annotation = name_only_annotation
            | "(" reserved_annotation_name "=" annotation_value ")"
    reserved_annotation_name = ( ONE_OF["A-Za-z_$"] | numeral )
    ~~~~

    12a. The annotation name `"n"` is available for assigning a simple
_name_ to an element, for example `"[d(n=re) d(n=im)]"`.  (This
annotation name is very special, since the `"n="` is assumed if
an annotation lacks an equals sign.)

    12b. The annotation name `"t"` is available for assigning _types_.
The user is encouraged to prefix the type string by an indication of
the language that defines the type, for example `"d(t=C:void*)"`.

    12c. The annotation name `"k"` is available for assigning
machine-level _kinds_ to variables.  Standard kinds are expressed with
upper-case letters: S for signed, U for unsigned, F for float, P for
pointer (machine address), V for vector, A for array, M for memory.
(M is a catch-all for structures which are not expected to load into
registers.)  These kinds pertain to the intended use, by standard
computers, of the value stored in the layout.  Thus, an aligned
unsigned 32-bit integer can be expressed as `"w(k=U)"`.

  13. Annotations of _kind_ can be abbreviated by prefix letters.
Each of the letters `"SUFPVAM"` can be a prefix to an element, in
which case the element is treated as if it were annotated by the
appropriate kind annotation.

    13a. For example, a vector of 4 single-precision floats could be
expressed as `"V4Fw"`, or as the equivalent `"4[w(k=F)](k=V)"`.

    13b. The annotation name `"P"` is available for declaring the
expected layout at the other end of a pointer.  If an element is
annotated with `"P"` and has no explicit kind annotation, it may be
assumed to have the kind `"P"`.  For example, a 64-bit pointer to a
pair of double precision floats might be written as `"d(P=2Fd)"`.  If
the layout is a separately-declared type, a hole can be left open for
later, such as `"d(P=$(h=struct:complex))" (see below about holes
and incomplete layouts).

  14. A layout description is _incomplete_ if it contains "holes" for
missing layout description syntax.  There are two sorts of holes,
layout elements and replication counts.  A layout element hole is
introduced as a dollar sign; the hole must occur in a place where a
layout element would be valid.  It must eventually be replaced by a
single layout element (such as a group).  A replication count hole is
introduced by a star; the hole must occur in a place where a numeric
replication count would be valid.

    ~~~~
    atom = bit | parenthesis | abbreviation | incomplete_element | ...
    incomplete_element = "$"
    count_prefix = count ( annotation )* ( "-" )? 
    count = numeral | incomplete_count
    incomplete_count = "*"
    ~~~~

    14a. Incomplete layout descriptions must be completed before they
are fully usable, although in some cases an incomplete layout may have
enough information to locate some of the layout elements.

    14b. An annotation with the standard name `"h"` can be used to
name the hole for later processing.  Alternatively, the hole has an
unambiguous path expression that addresses it, which can be used
(later) to associate the hole with an element, and complete the
expression.  A count hole can also be annotated with a path expression
to locate a nearby memory word which supplies the missing number.

    14c. For example, a vector of four items of indeterminate type,
temporarily called "vtype", can be expressed as `"$(h=vtype)(k=V)"`.
Or, an array of signed doublewords may have an undetermined length,
temporarily called "len", and this can be expressed as
`"*(h=len)[d(k=S)](k=A)"`.  If the length of the array immediately
precedes the array, a relative path expression can point at it:
`"[d(k=U)(len) *(h=**,len)[d(k=S)](k=A)]"` (Path expressions are
described later.)

    14d. As noted above, holes can express separately defined layouts.
For example, if a "Line" is a structure type which consists of two
three-component "Points", a point might be defined to have the layout
`"Sw(x) Sw(y) Sw(z)"`, while the layout of a line might be incomplete
like this:

    ~~~~
    [ 3w | [$(h=struct:Point)] || ] (start)
    [ 3w | [$(h=struct:Point)] || ] (end)
    ~~~~

    This layout expresses the sizing and alignment of the "start" and
"end" fields of a line, without duplicating the full definition, which
must be supplied later.  Note that the size of the `Line` structure (6
32-bit words) does not depend on the eventual size of the `Point`
struct, because the holes are placed in unsized alternatives.

  15. A layout element can be prefixed by "c" to create a _container_,
which means that the bits of the element may be extracted as a unit
into a bit-vector for further processing.  The bit vector, as
extracted, may be given its own bitwise layout (called the _container
layout_) which is independent of the memory element from which the bit
vector was extracted.  The bits may be extracted in a modified order,
and/or with gaps.

    ~~~~
    prefixed_element = replication | alignment | reversal
            | padding | container | ...
    container = "c" memory_element ( container_layout )?
    memory_element = element
    container_layout = "=" element
    ~~~~

    15a. The bit vector extracted into the container is determined by
the memory element (i.e., the one immediately after the `"c"`).  If
that element is a group, then the immediate sub-expressions of that
group, read from left to right and omitting padding, contribute the
bits, starting at bit position zero.  The group is not allowed to have
alternatives.  In the presence of reversal operators, the
left-to-right ordering of the group does _not_ necessary correspond to
little-endian bit or byte order.  In the presence of padding elements,
not all of the bits inside the memory element will be extracted into
the container's bit vector.

    15b. There is a special constraint on container layouts: Every
non-padding subexpression of a container layout must correspond to at
least one bit loaded from the memory element.  If the container layout
is a group that begins with reversed concatenation, the local origin
of the group is taken to align with the end of the extracted bit
vector.  Otherwise, the local origin is taken to align with the start
(position zero) of the bit vector.

    15c. In effect, the bits in a container are subject to two
layouts: One in memory before extraction, and one in a register after.
If there is no explicit container layout, a bit replication is
supplied implicitly, with exactly the right size.  For path
expressions, if the container layout is present, the container
as a whole is taken to be a group of two elements:  The memory
element, and the container layout.

  16. The prefix operator `">"` _swaps bytes_.  It pervasively edits
the following element, transforming every replicated occurrence of
`"o"` to `"-o"` and vice versa.  It expands abbreviations like `"w"`
and `"h"` before doing so.  (This is why we defined `"h"` as `"%2o"`
and not the otherwise equivalent `"%16b"`.)

    16a. Thus, `">d"` refers to a big-endian 8-byte word in memory,
while `">[dd]"`, or the equivalent `"[>d>d]"`, refers to a pair of
such words, but without any reversal of their relative order.  The
same layout can be expressed using arrays, as `">2d"` or `"2>d"`.

    16b. The editing descends into any group sub-expressions of the
prefixed element.  The editing does not extend into any sub-expression
preceded by the anti-swapping prefix operator `"<"`.  (Besides
offering this protection against editing, the anti-swapping operator
has no semantic effect on the element it prefixes.)  Also, if the
prefixed element has any nested container layouts, those layouts are
not edited.

  17. The layout description syntax supports canonical path
expressions for referring to elements of a layout.  A path expression
is a sequence (possibly empty) of tokens, each of which is either an
integer, a name string, or one of the special tokens `"*"` ("length"),
`"**"` ("up"), or `"***"` ("top").  The path expression describes the
movement of a logical cursor from an initial position to a final
position, somewhere within a layout.

    17a. The starting position is determined by the context in which
the path is used.  It may be just before the first element of the
whole layout description string; this is called the "top" position,
and is always assumed if the path begins with `"***"`.  The starting
position may be immediately before some element; this may happen if
that element contains an annotation with a self-relative path.

    ~~~~
    path = path_start ( path_sep path_token )* ( path_end )?
    path_token = numeral | annotation_name | "**"
    path_sep = ONE_OF["/.,"]
    path_start =  path_token | "***"
    path_end = path_sep "*"
    ~~~~

    17b. If a token is an integer _I_, the cursor moves to the _I_th
element in the (non-nested) layout expression sequence, starting with
the current element.  (If _I_ is zero, the cursor does not move.)  If
the integer is negative value _-I_, the cursor moves back to the _I_th
preceding layout element.  Alignment prefixes are not separately
counted and padding elements are skipped.  For arrays, this follows
the convention of zero-based indexing.  If there is a following token,
then the current element (immediately after the updated cursor)
must be a group, and the cursor moves to the beginning of the group.

    17c. If a token is a string, the cursor selects a named element in
the current (non-nested) layout expression sequence.  The element
selected is the first (after the current position) which is annotated
as having a simple name identical with the token.  (Users are
discouraged from giving names to layout elements which are
indistinguishable from numerals.)  A string is always equivalent to a
suitably chosen integer.  As with an integer, if there is a following
token, the cursor moves into current element, which must be a group.

    17d. If a token is the "up" token `"**"`, the cursor moves back
out of the current group.  (Note that the sequence `"0,**"` brings the
cursor back to where it started.)  If a token is the "length" token
`"*"`, it must be the last token in the path, and it selects not an
element, but the _length_ of the current element.

    17e. Implicitly introduced brackets (from replications) are
counted when deciding the nesting level of elements.  But parentheses
are not counted when deciding nesting level.

  18. Layouts can be used with other ad hoc syntax to express the
definition of structure types or function signatures.  Since a layout
must always have correctly matched parentheses (round brackets), a
layout can be placed between parentheses without ambiguity (as long as
suitable care is taken with comment conventions).  This was already
seen above in the pointer annotation, which could contain a nested
layout between the parentheses.

    18a. Pseudocode for associating a layout with a named type can be
adapted from the annotation syntax, like this:

    ~~~~
    struct:Point = [ Sw(x) Sw(y) Sw(z) ]
    struct:Line = [
      # nested structs are described with holes
      [ 3w | [$(h=struct:Point)] || ] (start)
      [ 3w | [$(h=struct:Point)] || ] (end)
    ]
    ~~~~

    18b. Layouts can be loosely adjoined with commas, since commas are
not part of the layout sytnax.  For example, a function signature can
be described as a parenthesized, comma-separate list of bit vector
types.  Such a signature can be either inserted in an annotation
value, or declared in pseudocode, like this:

    ~~~~
    function:fstat = (
        Sd (fildes),
        d  (buf)  (P=$(h=struct:stat))
      ) Sd
    struct:stat = [
      Uw (st_dev) (t=C:dev_t)
      Uw (st_ino) (t=C:ino_t)
      Uw (st_mode) (t=C:mode_t)
      ...
    ]
    ~~~~



More information about the panama-spec-experts mailing list