[foreign-memaccess] on value kinds

Fri Jul 19 15:13:43 UTC 2019

Yep - or users could create their own bag of constants class. The 
important thing is that we provide the right building blocks. Relying on 
constants neatly solves the usability problem, while keeping the 
building blocks simple-yet-powerful.

I think we could see AddressLayout more as a convenient wrapper around 
an annotated ValyeLayout. 
v64(kind=address)(pointeekind=void|value|function)

Using a Map<String, Constable> also seems like a cool idea. The only 
thing is that we might not be able to take full advantage of it using 
the descriptor language we have with jextract. i.e. anything we want to 
use as an annotation value there has to somehow be reducible to a 
string.

Jorn

On 2019-07-19 16:30, Maurizio Cimadamore wrote:
> One point I forgot to make. One might think that creating e.g. a
> layout for a float using annotations might be cumbersome. But, in
> reality, we are considering adding layout constants for all ABI types,
> so this will rarely be an issue in practice.
> 
> Maurizio
> 
> On 19/07/2019 15:28, Maurizio Cimadamore wrote:
>> 
>> On 19/07/2019 14:24, Jorn Vernee wrote:
>>>> Comments?
>>> 
>>> It seems that the need for embedding structural carrier info is 
>>> specific to implementing ABI bahaviour. So, part of the choice seems 
>>> to be whether we want to use an ABI agnostic API to encode the ABI 
>>> specific stuff into (like layout annotations), or if we want to 
>>> create ABI specific abstractions (e.g. have a CStruct type).
>>> 
>>> Maybe we ultimately need a mix of both (note that we already have a 
>>> pretty C specific AddressLayout added), but for now I think layout 
>>> annotations are the way to go, also since inventing something like a 
>>> CStruct would almost make GroupLayouts redundant it seems.
>> As for AddressLayout, I've been split about it; I don't view is at C 
>> specific - I view it mostly as a way to attach info about the contents 
>> of the addressed memory. Again, this can easily be achieved with an 
>> annotation.
>>> 
>>> Looking at memaccess, and the fact that we never actually use the 
>>> 'kind' of a value, I think morally (all) the value kinds should be 
>>> moved to annotations.
>> 
>> I came up with an idea that looks compelling: we could, fairly easily, 
>> generalize layouts to support annotations that are described by a 
>> copy-on-write map of the kind:
>> 
>> Map<String, Constable>
>> 
>> That is, a layout annotation has a "name" and a "value" but, 
>> interestingly, the value can be ANY Java object, provided such object 
>> can be expressed as a constant. Interesting examples of constants:
>> 
>> * Strings (which allow us to add layout names, which is required by 
>> the memory access API)
>> * Enums (which allow us to add kinds to layouts - e.g. each ABI can 
>> have its own Kind enum and layout constants for that ABI can just 
>> create annotated layouts from there)
>> * Layout themselves! (which would allow us to encode adressee info, 
>> but also to encode optional substructure for things like bitfields!)
>> 
>> Pulling more on this string, I believe that we can make 
>> equals/hashcode of a Layout safely ignore annotations. E.g. a layout 
>> is all about size/endianness/alignment.
>> 
>> This seems to me a much more stable position in the design space.
>> 
>> Maurizio
>> 
>>> Jorn
>>> 
>>> On 2019-07-19 13:45, Maurizio Cimadamore wrote:
>>>> Picking up this again.
>>>> 
>>>> I've been doing more thinking on this topic, and it seems to me that
>>>> attaching a notion of 'kind' on a layout seems suboptimal for a 
>>>> number
>>>> of reasons:
>>>> 
>>>> * layouts should be about sizes, alignments and endianness - not 
>>>> semantics
>>>> 
>>>> * by attaching kinds to layouts we end up replicating some of the
>>>> information that is already available in carrier types
>>>> 
>>>> * a kind-based system feels 'arbitrary' and it is difficult to 
>>>> extend
>>>> - unless we resort to plain strings/annotations
>>>> 
>>>> * some kinds are just plain useless - e.g. no difference between
>>>> signed vs. unsigned
>>>> 
>>>> Now, I'd very much like to propose to just get rid of ValueLayout
>>>> kinds. After all, when dereferencing memory the _carrier_ will drive
>>>> the process and make sure that the right semantics is applied; so,
>>>> carriers are for _semantics_, layouts are, well, for layouts! 
>>>> Another
>>>> advantage is that, if we do this, padding layouts just disappears -
>>>> just some unnamed value layout.
>>>> 
>>>> But, if you pull on this string, while things are still perfectly 
>>>> fine
>>>> for the memory access API (we always have a carrier when we need to
>>>> dereference), we run into an issue with ABI classification. Let's
>>>> simplify things a bit, and assume there are three main categories of
>>>> values that a foreign function has to deal with:
>>>> 
>>>> 1) scalars (e.g. int, float, long double)
>>>> 2) pointers (function pointers, object pointers)
>>>> 3) composites (structs/unions)
>>>> 
>>>> Now, eliminating kinds from ValueLayout will have zero consequences 
>>>> on
>>>> (1) and (2). After all, the layout + carrier info is always enough 
>>>> to
>>>> do a basic classification - e.g. (using the SysV terminology)
>>>> 
>>>> byte.class, char.class, short.class, integer.class, long.class ->
>>>> INTEGER or MEMORY (if no register available)
>>>> float.class, double.class -> SSE or MEMORY (if no register 
>>>> available)
>>>> MemoryAddress -> POINTER or MEMORY (if no register available)
>>>> 
>>>> (we can also have extra carriers for exotic types such as x87).
>>>> 
>>>> This is all good and well - maybe we'll have to tweak the
>>>> classification routines a little to work not just on layouts, but
>>>> layout + carrier, but it's all doable.
>>>> 
>>>> But what about (3) ? The classification routines for structs/unions
>>>> are mind-bogglingly complex (at least in SysV), and you need _full_
>>>> knowledge of the ins and outs of a struct in order to classify it
>>>> ABI-wise. That is, you have to know whether the struct fields belong
>>>> in (1), (2) or (3), recursively.
>>>> 
>>>> And here's the issue - if we use MemorySegment as a carrier for 
>>>> _all_
>>>> structs/unions, that carrier is just not powerful enough to allow us
>>>> to do that kind of recursive classification! Our answer to this has
>>>> been: don't use carriers for recursive stuff - just use layouts, 
>>>> which
>>>> is why we have the FP vs. INT distinction in the ValueLayout. So, in 
>>>> a
>>>> way, while there are many arguments for pushing kind info outside
>>>> layouts, there are also strong arguments in favor of keeping it in.
>>>> 
>>>> If we were to completely drop kinds from layouts, I see only two 
>>>> options:
>>>> 
>>>> 1) Put the client in charge of recursive classification - that is,
>>>> essentially, eliminate (3) from the picture - and lower all struct
>>>> arguments to a sequence of (1) and (2).
>>>> 
>>>> 2) Invent a new carrier type that is powerful enough to embed
>>>> structural carrier info:
>>>> 
>>>> e.g.
>>>> 
>>>> static Struct of(MemorySegment segment, Class<?>... fieldCarriers)
>>>> 
>>>> Now, as much as I see the appealing simplicity of (1), I can't help
>>>> but feeling that it pushes the problem around, it doesn't completely
>>>> address it. I imagine that no user will really want to 'decompose' a
>>>> memory segment into multiple chunks by hand before passing it down 
>>>> to
>>>> a native function. Which brings up a (big) question:
>>>> 
>>>> "On how on earth are we going to infer a MH adaptation from the
>>>> signature the user expects and the signature the ABI expects?"
>>>> 
>>>> I don't see any way to address that question, without, again, doing
>>>> some hacks to layouts --- or inventing a completely new kind of
>>>> description for functions that is expressive enough to capture all 
>>>> the
>>>> moving parts, but in that case we can just use that new description 
>>>> to
>>>> solve (3) ?
>>>> 
>>>> Some of these problems, as it appears are not 100% new, and have 
>>>> been
>>>> discussed in the context of the LLVM project:
>>>> 
>>>> http://lists.llvm.org/pipermail/llvm-dev/2019-January/129137.html
>>>> 
>>>> The thread is _very_ interesting - although some of the specifics of
>>>> the solution are different, there is a strong correlation with 
>>>> what's
>>>> being discussed here, and with the fact that declarative ways to
>>>> specify calling convention work pretty well for calls which only 
>>>> takes
>>>> scalar values, but kind of break apart for composite (as stated
>>>> above).
>>>> 
>>>> 
>>>> All things considered, I think kind-less layouts with abi-specific
>>>> annotations still features the best bang for bucks ratio than any of
>>>> the alternatives we've considered. And, if we wanted to support some
>>>> minimal subset of kinds - I think adding distinction between 
>>>> floating
>>>> point and integral is probably the most crucial and ABI-agnostic one
>>>> we can possibly come up with, something that will basically reduce 
>>>> the
>>>> need for ABI annotations in 90% (more?) of cases. Signed vs. 
>>>> unsigned
>>>> distinction, on the other hand, should just go away - it doesn't add
>>>> any material difference to how arguments get classified by ABIs, and
>>>> it mostly represent a user-level distinction (and layouts should not
>>>> be in the business of capture that degree of distinctions).
>>>> 
>>>> Comments?
>>>> 
>>>> Maurizio
>>>> 
>>>> 
>>>> On 15/07/2019 22:18, Maurizio Cimadamore wrote:
>>>>> Hi,
>>>>> as I was (re)starting the works on the second step of the Panama 
>>>>> pipeline (foreign function access), it occurred to me that one 
>>>>> piece of the design for ValueLayout is not 100% flushed out. I'm 
>>>>> referring to the different 'kinds' of value layouts available in 
>>>>> the API:
>>>>> 
>>>>> * signed int
>>>>> * unsigned int
>>>>> * floating point
>>>>> 
>>>>> We made this distinction long ago - the intention was to capture 
>>>>> important distinctions between different layouts in an explicit 
>>>>> fashion. For instance, system ABI typically pass integer values via 
>>>>> general register, while they pass floating point values via 
>>>>> floating point or vector registers. So it seemed an important 
>>>>> distinction to capture.
>>>>> 
>>>>> When I later started to work on support for x87 types, I realized 
>>>>> that it wasn't all that simple - a "long double" in SysV ABI is 
>>>>> typically encoded as a 128 bit floating point using the x87 
>>>>> extended precision format [1], but so is a "binary128" which 
>>>>> instead uses the quad precision format [2]. In other words, the 
>>>>> kind/size pair does not unambiguously denote a specific type 
>>>>> semantics. Moreover, x87 types only really make sense when it comes 
>>>>> to the SysV ABI, and the implementation of that ABI will have to 
>>>>> ask whether a certain layout is that of an x87 floating point value 
>>>>> - which brings up the question on how are these special, 
>>>>> platform-dependent layouts denoted in the first place?
>>>>> 
>>>>> Since Panama layouts support annotations, we always had the 
>>>>> annotation route available to us to distinguish between these 
>>>>> different types - that is:
>>>>> 
>>>>> f128[abi=x87]
>>>>> 
>>>>> could denote an extended precision x87 value, whereas:
>>>>> 
>>>>> f128[abi=quadfloat]
>>>>> 
>>>>> could denote a 128 floating point value using the 'quad' float 
>>>>> format (binary128).
>>>>> 
>>>>> This is of course still a viable option - yes, the memory access 
>>>>> API no longer have general purpose annotations, but it's easy 
>>>>> enough to add them back in as part of the System ABI support, and 
>>>>> then retrofit layout 'names' as a special kind of annotation - 
>>>>> that's a move we have pulled in the past and we know it works.
>>>>> 
>>>>> But looking at this problem with fresh eyes, I'm noting an 
>>>>> asymmetry, one that John pointed out in the past: the set of kinds 
>>>>> supported by ValueLayout seem somewhat arbitrary, fixed and 
>>>>> non-extensible in ways other than using annotations. What is the 
>>>>> advantage of being able to tell an 'int' from a 'float' if we can't 
>>>>> tell a 'x87' double from a 'quad float' ? Why is the former 
>>>>> distinction supported _natively_ by the layout descriptions, 
>>>>> whereas the latter is only supported indirectly, via annotations?
>>>>> 
>>>>> Of course, we know that former proposals, such as LDL [3], 
>>>>> precisely for this reason, decided not to embed any semantics in 
>>>>> their 'kinds'. That is, LDL really has only bits and group of bits 
>>>>> - all the semantics is specified via annotations. This is a more 
>>>>> symmetric approach - there are no 'blessed' kinds, everything 
>>>>> happens through annotations. This is certainly a fine decision when 
>>>>> designing a layout language with a given fixed grammar.
>>>>> 
>>>>> But, using annotations inside layouts is also a very indirect 
>>>>> approach. Can we do better?After all, it seems that, if we leverage 
>>>>> the fact that layouts are API elements, or objects, we can 
>>>>> formulate an alternate solution where:
>>>>> 
>>>>> * value layouts _cannot be created_ you have to use one of the 
>>>>> pre-baked constants - we have already discussed introducing layout 
>>>>> constants in [4] anyway
>>>>> * among the constants, users will find some that are ABI-specific 
>>>>> (e.g. there will be one constant for x87 values, one for quadfloat, 
>>>>> and so forth).
>>>>> * testing 'is this layout a x87 layout?' reduces to an equality 
>>>>> test (e.g. "layout == SYSV_X87")
>>>>> 
>>>>> Something like this would allow us to have layouts which are 
>>>>> _internally_ general enough to express system ABI specific types - 
>>>>> but at the level of the public API, a layout for a 128-bit x87 
>>>>> value would be the same as the one for a quad float - there would 
>>>>> be no way for the user to tell them apart, other than noting that 
>>>>> the two layouts correspond to different pre-baked constants. And 
>>>>> this is, perhaps, a good outcome - after all, the distinction 
>>>>> between these two layouts is a _semantic_ distinction, not 
>>>>> (strictly speaking) a layout one (in fact the layout is, in terms 
>>>>> of size and alignment, indeed the same in both cases). Therefore, 
>>>>> it is very likely that this semantic distinction will only be of 
>>>>> interest to very critical component of the Panama runtime - and 
>>>>> that most of the clients will not care much about the distinction 
>>>>> (other than maybe occasionally testing for "is this an x87 
>>>>> layout").
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Maurizio
>>>>> 
>>>>> [1] - 
>>>>> https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats
>>>>> [2] - 
>>>>> https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format 
>>>>> [3] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
>>>>> [4] - 
>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-July/005908.html