[foreign-memaccess] on value kinds

Tue Jul 16 17:44:23 UTC 2019

Hi,

There seem to be 2 problems at play:

1.) Being able to express more value kinds in the Layout API.
2.) Having a way to tell SystemABI, for a given native function, how the 
machinery should handle the Java arguments it gets passed.

We have been deriving the information for 2. from layouts, at least in 
the cases of our limited ABI support (C only), but we've already seen 
this fall short in that past in the case of varargs, and now again. I'm 
asking myself; how many times do we have to come back to the Layout API 
to add support for expressing the semantics of language/ABI XYZ? I'd 
like to be able to solve 2. separately from 1.

So far we seem to be on the same page, but to solve 2., I don't think 
using the identity of some Layout constant is powerful/robust enough. 
I'd really like to see something based on a public version of 
ArgumentBinding, where for a certain Java argument we have a way of 
telling the VM how this should be shuffled into a target 
register/stack/other. That would allow users to implement their own ABI 
support, and we can implement the current C ABI support on top of that.

But, in the short term, we need a way of expressing a C function 
signature so that the current SystemABI knows how to call it. Relying on 
Layouts only seems to get us so far, so I think we need a separate 
abstraction for expressing C functions (though probably partially Layout 
based), which captures all the semantics needed for C specifically. We 
could then choose to either encode the fact that we're dealing with an 
x87 float or a quad precision float using a separate flag in this new 
abstraction, or encoding it in the argument Layouts e.g. by relying on 
annotations, or something else.

Coming back to 1.; It seems fair to have the ability to express more 
value kinds in the Layout API, like you say; why should we have 3 
'blessed' kinds? We could solve this problem by replacing the 
ValueLayout::Kind enum with a simple (multi character) string, as a sort 
of light-weight alternative to using annotations. Then we could for 
example express an x87 double as e.g. 'd128', and a quad float as 
'q128', or something else as 'xyz128'. How the kind string is 
interpreted is up to the code using the Layout.

What do you think?

Jorn

On 2019-07-15 23:18, Maurizio Cimadamore wrote:
> Hi,
> as I was (re)starting the works on the second step of the Panama
> pipeline (foreign function access), it occurred to me that one piece
> of the design for ValueLayout is not 100% flushed out. I'm referring
> to the different 'kinds' of value layouts available in the API:
> 
> * signed int
> * unsigned int
> * floating point
> 
> We made this distinction long ago - the intention was to capture
> important distinctions between different layouts in an explicit
> fashion. For instance, system ABI typically pass integer values via
> general register, while they pass floating point values via floating
> point or vector registers. So it seemed an important distinction to
> capture.
> 
> When I later started to work on support for x87 types, I realized that
> it wasn't all that simple - a "long double" in SysV ABI is typically
> encoded as a 128 bit floating point using the x87 extended precision
> format [1], but so is a "binary128" which instead uses the quad
> precision format [2]. In other words, the kind/size pair does not
> unambiguously denote a specific type semantics. Moreover, x87 types
> only really make sense when it comes to the SysV ABI, and the
> implementation of that ABI will have to ask whether a certain layout
> is that of an x87 floating point value - which brings up the question
> on how are these special, platform-dependent layouts denoted in the
> first place?
> 
> Since Panama layouts support annotations, we always had the annotation
> route available to us to distinguish between these different types -
> that is:
> 
> f128[abi=x87]
> 
> could denote an extended precision x87 value, whereas:
> 
> f128[abi=quadfloat]
> 
> could denote a 128 floating point value using the 'quad' float format
> (binary128).
> 
> This is of course still a viable option - yes, the memory access API
> no longer have general purpose annotations, but it's easy enough to
> add them back in as part of the System ABI support, and then retrofit
> layout 'names' as a special kind of annotation - that's a move we have
> pulled in the past and we know it works.
> 
> But looking at this problem with fresh eyes, I'm noting an asymmetry,
> one that John pointed out in the past: the set of kinds supported by
> ValueLayout seem somewhat arbitrary, fixed and non-extensible in ways
> other than using annotations. What is the advantage of being able to
> tell an 'int' from a 'float' if we can't tell a 'x87' double from a
> 'quad float' ? Why is the former distinction supported _natively_ by
> the layout descriptions, whereas the latter is only supported
> indirectly, via annotations?
> 
> Of course, we know that former proposals, such as LDL [3], precisely
> for this reason, decided not to embed any semantics in their 'kinds'.
> That is, LDL really has only bits and group of bits - all the
> semantics is specified via annotations. This is a more symmetric
> approach - there are no 'blessed' kinds, everything happens through
> annotations. This is certainly a fine decision when designing a layout
> language with a given fixed grammar.
> 
> But, using annotations inside layouts is also a very indirect
> approach. Can we do better?After all, it seems that, if we leverage
> the fact that layouts are API elements, or objects, we can formulate
> an alternate solution where:
> 
> * value layouts _cannot be created_ you have to use one of the
> pre-baked constants - we have already discussed introducing layout
> constants in [4] anyway
> * among the constants, users will find some that are ABI-specific
> (e.g. there will be one constant for x87 values, one for quadfloat,
> and so forth).
> * testing 'is this layout a x87 layout?' reduces to an equality test
> (e.g. "layout == SYSV_X87")
> 
> Something like this would allow us to have layouts which are
> _internally_ general enough to express system ABI specific types - but
> at the level of the public API, a layout for a 128-bit x87 value would
> be the same as the one for a quad float - there would be no way for
> the user to tell them apart, other than noting that the two layouts
> correspond to different pre-baked constants. And this is, perhaps, a
> good outcome - after all, the distinction between these two layouts is
> a _semantic_ distinction, not (strictly speaking) a layout one (in
> fact the layout is, in terms of size and alignment, indeed the same in
> both cases). Therefore, it is very likely that this semantic
> distinction will only be of interest to very critical component of the
> Panama runtime - and that most of the clients will not care much about
> the distinction (other than maybe occasionally testing for "is this an
> x87 layout").
> 
> Thoughts?
> 
> Maurizio
> 
> [1] -
> https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats
> [2] - 
> https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format
> [3] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
> [4] - 
> https://mail.openjdk.java.net/pipermail/panama-dev/2019-July/005908.html