[foreign-memaccess] on value kinds

Mon Jul 15 21:18:13 UTC 2019

Hi,
as I was (re)starting the works on the second step of the Panama 
pipeline (foreign function access), it occurred to me that one piece of 
the design for ValueLayout is not 100% flushed out. I'm referring to the 
different 'kinds' of value layouts available in the API:

* signed int
* unsigned int
* floating point

We made this distinction long ago - the intention was to capture 
important distinctions between different layouts in an explicit fashion. 
For instance, system ABI typically pass integer values via general 
register, while they pass floating point values via floating point or 
vector registers. So it seemed an important distinction to capture.

When I later started to work on support for x87 types, I realized that 
it wasn't all that simple - a "long double" in SysV ABI is typically 
encoded as a 128 bit floating point using the x87 extended precision 
format [1], but so is a "binary128" which instead uses the quad 
precision format [2]. In other words, the kind/size pair does not 
unambiguously denote a specific type semantics. Moreover, x87 types only 
really make sense when it comes to the SysV ABI, and the implementation 
of that ABI will have to ask whether a certain layout is that of an x87 
floating point value - which brings up the question on how are these 
special, platform-dependent layouts denoted in the first place?

Since Panama layouts support annotations, we always had the annotation 
route available to us to distinguish between these different types - 
that is:

f128[abi=x87]

could denote an extended precision x87 value, whereas:

f128[abi=quadfloat]

could denote a 128 floating point value using the 'quad' float format 
(binary128).

This is of course still a viable option - yes, the memory access API no 
longer have general purpose annotations, but it's easy enough to add 
them back in as part of the System ABI support, and then retrofit layout 
'names' as a special kind of annotation - that's a move we have pulled 
in the past and we know it works.

But looking at this problem with fresh eyes, I'm noting an asymmetry, 
one that John pointed out in the past: the set of kinds supported by 
ValueLayout seem somewhat arbitrary, fixed and non-extensible in ways 
other than using annotations. What is the advantage of being able to 
tell an 'int' from a 'float' if we can't tell a 'x87' double from a 
'quad float' ? Why is the former distinction supported _natively_ by the 
layout descriptions, whereas the latter is only supported indirectly, 
via annotations?

Of course, we know that former proposals, such as LDL [3], precisely for 
this reason, decided not to embed any semantics in their 'kinds'. That 
is, LDL really has only bits and group of bits - all the semantics is 
specified via annotations. This is a more symmetric approach - there are 
no 'blessed' kinds, everything happens through annotations. This is 
certainly a fine decision when designing a layout language with a given 
fixed grammar.

But, using annotations inside layouts is also a very indirect approach. 
Can we do better?After all, it seems that, if we leverage the fact that 
layouts are API elements, or objects, we can formulate an alternate 
solution where:

* value layouts _cannot be created_ you have to use one of the pre-baked 
constants - we have already discussed introducing layout constants in 
[4] anyway
* among the constants, users will find some that are ABI-specific (e.g. 
there will be one constant for x87 values, one for quadfloat, and so forth).
* testing 'is this layout a x87 layout?' reduces to an equality test 
(e.g. "layout == SYSV_X87")

Something like this would allow us to have layouts which are 
_internally_ general enough to express system ABI specific types - but 
at the level of the public API, a layout for a 128-bit x87 value would 
be the same as the one for a quad float - there would be no way for the 
user to tell them apart, other than noting that the two layouts 
correspond to different pre-baked constants. And this is, perhaps, a 
good outcome - after all, the distinction between these two layouts is a 
_semantic_ distinction, not (strictly speaking) a layout one (in fact 
the layout is, in terms of size and alignment, indeed the same in both 
cases). Therefore, it is very likely that this semantic distinction will 
only be of interest to very critical component of the Panama runtime - 
and that most of the clients will not care much about the distinction 
(other than maybe occasionally testing for "is this an x87 layout").

Thoughts?

Maurizio

[1] - 
https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats
[2] - 
https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format
[3] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
[4] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2019-July/005908.html