[foreign-memaccess] on value kinds

Fri Jul 19 15:58:34 UTC 2019

On 19/07/2019 16:13, Jorn Vernee wrote:
> Yep - or users could create their own bag of constants class. The 
> important thing is that we provide the right building blocks. Relying 
> on constants neatly solves the usability problem, while keeping the 
> building blocks simple-yet-powerful.
>
> I think we could see AddressLayout more as a convenient wrapper around 
> an annotated ValyeLayout. 
> v64(kind=address)(pointeekind=void|value|function)
>
> Using a Map<String, Constable> also seems like a cool idea. The only 
> thing is that we might not be able to take full advantage of it using 
> the descriptor language we have with jextract. i.e. anything we want 
> to use as an annotation value there has to somehow be reducible to a 
> string.

Sure - but in practice, strings, layouts and enum constants are all 
easily reducible to strings - so it might not be a big issue (at least 
for 'standard' annotations). Not all the things we do at the API level 
have to be expressible at the layout language level, I think.

Maurizio

>
> Jorn
>
> On 2019-07-19 16:30, Maurizio Cimadamore wrote:
>> One point I forgot to make. One might think that creating e.g. a
>> layout for a float using annotations might be cumbersome. But, in
>> reality, we are considering adding layout constants for all ABI types,
>> so this will rarely be an issue in practice.
>>
>> Maurizio
>>
>> On 19/07/2019 15:28, Maurizio Cimadamore wrote:
>>>
>>> On 19/07/2019 14:24, Jorn Vernee wrote:
>>>>> Comments?
>>>>
>>>> It seems that the need for embedding structural carrier info is 
>>>> specific to implementing ABI bahaviour. So, part of the choice 
>>>> seems to be whether we want to use an ABI agnostic API to encode 
>>>> the ABI specific stuff into (like layout annotations), or if we 
>>>> want to create ABI specific abstractions (e.g. have a CStruct type).
>>>>
>>>> Maybe we ultimately need a mix of both (note that we already have a 
>>>> pretty C specific AddressLayout added), but for now I think layout 
>>>> annotations are the way to go, also since inventing something like 
>>>> a CStruct would almost make GroupLayouts redundant it seems.
>>> As for AddressLayout, I've been split about it; I don't view is at C 
>>> specific - I view it mostly as a way to attach info about the 
>>> contents of the addressed memory. Again, this can easily be achieved 
>>> with an annotation.
>>>>
>>>> Looking at memaccess, and the fact that we never actually use the 
>>>> 'kind' of a value, I think morally (all) the value kinds should be 
>>>> moved to annotations.
>>>
>>> I came up with an idea that looks compelling: we could, fairly 
>>> easily, generalize layouts to support annotations that are described 
>>> by a copy-on-write map of the kind:
>>>
>>> Map<String, Constable>
>>>
>>> That is, a layout annotation has a "name" and a "value" but, 
>>> interestingly, the value can be ANY Java object, provided such 
>>> object can be expressed as a constant. Interesting examples of 
>>> constants:
>>>
>>> * Strings (which allow us to add layout names, which is required by 
>>> the memory access API)
>>> * Enums (which allow us to add kinds to layouts - e.g. each ABI can 
>>> have its own Kind enum and layout constants for that ABI can just 
>>> create annotated layouts from there)
>>> * Layout themselves! (which would allow us to encode adressee info, 
>>> but also to encode optional substructure for things like bitfields!)
>>>
>>> Pulling more on this string, I believe that we can make 
>>> equals/hashcode of a Layout safely ignore annotations. E.g. a layout 
>>> is all about size/endianness/alignment.
>>>
>>> This seems to me a much more stable position in the design space.
>>>
>>> Maurizio
>>>
>>>> Jorn
>>>>
>>>> On 2019-07-19 13:45, Maurizio Cimadamore wrote:
>>>>> Picking up this again.
>>>>>
>>>>> I've been doing more thinking on this topic, and it seems to me that
>>>>> attaching a notion of 'kind' on a layout seems suboptimal for a 
>>>>> number
>>>>> of reasons:
>>>>>
>>>>> * layouts should be about sizes, alignments and endianness - not 
>>>>> semantics
>>>>>
>>>>> * by attaching kinds to layouts we end up replicating some of the
>>>>> information that is already available in carrier types
>>>>>
>>>>> * a kind-based system feels 'arbitrary' and it is difficult to extend
>>>>> - unless we resort to plain strings/annotations
>>>>>
>>>>> * some kinds are just plain useless - e.g. no difference between
>>>>> signed vs. unsigned
>>>>>
>>>>> Now, I'd very much like to propose to just get rid of ValueLayout
>>>>> kinds. After all, when dereferencing memory the _carrier_ will drive
>>>>> the process and make sure that the right semantics is applied; so,
>>>>> carriers are for _semantics_, layouts are, well, for layouts! Another
>>>>> advantage is that, if we do this, padding layouts just disappears -
>>>>> just some unnamed value layout.
>>>>>
>>>>> But, if you pull on this string, while things are still perfectly 
>>>>> fine
>>>>> for the memory access API (we always have a carrier when we need to
>>>>> dereference), we run into an issue with ABI classification. Let's
>>>>> simplify things a bit, and assume there are three main categories of
>>>>> values that a foreign function has to deal with:
>>>>>
>>>>> 1) scalars (e.g. int, float, long double)
>>>>> 2) pointers (function pointers, object pointers)
>>>>> 3) composites (structs/unions)
>>>>>
>>>>> Now, eliminating kinds from ValueLayout will have zero 
>>>>> consequences on
>>>>> (1) and (2). After all, the layout + carrier info is always enough to
>>>>> do a basic classification - e.g. (using the SysV terminology)
>>>>>
>>>>> byte.class, char.class, short.class, integer.class, long.class ->
>>>>> INTEGER or MEMORY (if no register available)
>>>>> float.class, double.class -> SSE or MEMORY (if no register available)
>>>>> MemoryAddress -> POINTER or MEMORY (if no register available)
>>>>>
>>>>> (we can also have extra carriers for exotic types such as x87).
>>>>>
>>>>> This is all good and well - maybe we'll have to tweak the
>>>>> classification routines a little to work not just on layouts, but
>>>>> layout + carrier, but it's all doable.
>>>>>
>>>>> But what about (3) ? The classification routines for structs/unions
>>>>> are mind-bogglingly complex (at least in SysV), and you need _full_
>>>>> knowledge of the ins and outs of a struct in order to classify it
>>>>> ABI-wise. That is, you have to know whether the struct fields belong
>>>>> in (1), (2) or (3), recursively.
>>>>>
>>>>> And here's the issue - if we use MemorySegment as a carrier for _all_
>>>>> structs/unions, that carrier is just not powerful enough to allow us
>>>>> to do that kind of recursive classification! Our answer to this has
>>>>> been: don't use carriers for recursive stuff - just use layouts, 
>>>>> which
>>>>> is why we have the FP vs. INT distinction in the ValueLayout. So, 
>>>>> in a
>>>>> way, while there are many arguments for pushing kind info outside
>>>>> layouts, there are also strong arguments in favor of keeping it in.
>>>>>
>>>>> If we were to completely drop kinds from layouts, I see only two 
>>>>> options:
>>>>>
>>>>> 1) Put the client in charge of recursive classification - that is,
>>>>> essentially, eliminate (3) from the picture - and lower all struct
>>>>> arguments to a sequence of (1) and (2).
>>>>>
>>>>> 2) Invent a new carrier type that is powerful enough to embed
>>>>> structural carrier info:
>>>>>
>>>>> e.g.
>>>>>
>>>>> static Struct of(MemorySegment segment, Class<?>... fieldCarriers)
>>>>>
>>>>> Now, as much as I see the appealing simplicity of (1), I can't help
>>>>> but feeling that it pushes the problem around, it doesn't completely
>>>>> address it. I imagine that no user will really want to 'decompose' a
>>>>> memory segment into multiple chunks by hand before passing it down to
>>>>> a native function. Which brings up a (big) question:
>>>>>
>>>>> "On how on earth are we going to infer a MH adaptation from the
>>>>> signature the user expects and the signature the ABI expects?"
>>>>>
>>>>> I don't see any way to address that question, without, again, doing
>>>>> some hacks to layouts --- or inventing a completely new kind of
>>>>> description for functions that is expressive enough to capture all 
>>>>> the
>>>>> moving parts, but in that case we can just use that new 
>>>>> description to
>>>>> solve (3) ?
>>>>>
>>>>> Some of these problems, as it appears are not 100% new, and have been
>>>>> discussed in the context of the LLVM project:
>>>>>
>>>>> http://lists.llvm.org/pipermail/llvm-dev/2019-January/129137.html
>>>>>
>>>>> The thread is _very_ interesting - although some of the specifics of
>>>>> the solution are different, there is a strong correlation with what's
>>>>> being discussed here, and with the fact that declarative ways to
>>>>> specify calling convention work pretty well for calls which only 
>>>>> takes
>>>>> scalar values, but kind of break apart for composite (as stated
>>>>> above).
>>>>>
>>>>>
>>>>> All things considered, I think kind-less layouts with abi-specific
>>>>> annotations still features the best bang for bucks ratio than any of
>>>>> the alternatives we've considered. And, if we wanted to support some
>>>>> minimal subset of kinds - I think adding distinction between floating
>>>>> point and integral is probably the most crucial and ABI-agnostic one
>>>>> we can possibly come up with, something that will basically reduce 
>>>>> the
>>>>> need for ABI annotations in 90% (more?) of cases. Signed vs. unsigned
>>>>> distinction, on the other hand, should just go away - it doesn't add
>>>>> any material difference to how arguments get classified by ABIs, and
>>>>> it mostly represent a user-level distinction (and layouts should not
>>>>> be in the business of capture that degree of distinctions).
>>>>>
>>>>> Comments?
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>> On 15/07/2019 22:18, Maurizio Cimadamore wrote:
>>>>>> Hi,
>>>>>> as I was (re)starting the works on the second step of the Panama 
>>>>>> pipeline (foreign function access), it occurred to me that one 
>>>>>> piece of the design for ValueLayout is not 100% flushed out. I'm 
>>>>>> referring to the different 'kinds' of value layouts available in 
>>>>>> the API:
>>>>>>
>>>>>> * signed int
>>>>>> * unsigned int
>>>>>> * floating point
>>>>>>
>>>>>> We made this distinction long ago - the intention was to capture 
>>>>>> important distinctions between different layouts in an explicit 
>>>>>> fashion. For instance, system ABI typically pass integer values 
>>>>>> via general register, while they pass floating point values via 
>>>>>> floating point or vector registers. So it seemed an important 
>>>>>> distinction to capture.
>>>>>>
>>>>>> When I later started to work on support for x87 types, I realized 
>>>>>> that it wasn't all that simple - a "long double" in SysV ABI is 
>>>>>> typically encoded as a 128 bit floating point using the x87 
>>>>>> extended precision format [1], but so is a "binary128" which 
>>>>>> instead uses the quad precision format [2]. In other words, the 
>>>>>> kind/size pair does not unambiguously denote a specific type 
>>>>>> semantics. Moreover, x87 types only really make sense when it 
>>>>>> comes to the SysV ABI, and the implementation of that ABI will 
>>>>>> have to ask whether a certain layout is that of an x87 floating 
>>>>>> point value - which brings up the question on how are these 
>>>>>> special, platform-dependent layouts denoted in the first place?
>>>>>>
>>>>>> Since Panama layouts support annotations, we always had the 
>>>>>> annotation route available to us to distinguish between these 
>>>>>> different types - that is:
>>>>>>
>>>>>> f128[abi=x87]
>>>>>>
>>>>>> could denote an extended precision x87 value, whereas:
>>>>>>
>>>>>> f128[abi=quadfloat]
>>>>>>
>>>>>> could denote a 128 floating point value using the 'quad' float 
>>>>>> format (binary128).
>>>>>>
>>>>>> This is of course still a viable option - yes, the memory access 
>>>>>> API no longer have general purpose annotations, but it's easy 
>>>>>> enough to add them back in as part of the System ABI support, and 
>>>>>> then retrofit layout 'names' as a special kind of annotation - 
>>>>>> that's a move we have pulled in the past and we know it works.
>>>>>>
>>>>>> But looking at this problem with fresh eyes, I'm noting an 
>>>>>> asymmetry, one that John pointed out in the past: the set of 
>>>>>> kinds supported by ValueLayout seem somewhat arbitrary, fixed and 
>>>>>> non-extensible in ways other than using annotations. What is the 
>>>>>> advantage of being able to tell an 'int' from a 'float' if we 
>>>>>> can't tell a 'x87' double from a 'quad float' ? Why is the former 
>>>>>> distinction supported _natively_ by the layout descriptions, 
>>>>>> whereas the latter is only supported indirectly, via annotations?
>>>>>>
>>>>>> Of course, we know that former proposals, such as LDL [3], 
>>>>>> precisely for this reason, decided not to embed any semantics in 
>>>>>> their 'kinds'. That is, LDL really has only bits and group of 
>>>>>> bits - all the semantics is specified via annotations. This is a 
>>>>>> more symmetric approach - there are no 'blessed' kinds, 
>>>>>> everything happens through annotations. This is certainly a fine 
>>>>>> decision when designing a layout language with a given fixed 
>>>>>> grammar.
>>>>>>
>>>>>> But, using annotations inside layouts is also a very indirect 
>>>>>> approach. Can we do better?After all, it seems that, if we 
>>>>>> leverage the fact that layouts are API elements, or objects, we 
>>>>>> can formulate an alternate solution where:
>>>>>>
>>>>>> * value layouts _cannot be created_ you have to use one of the 
>>>>>> pre-baked constants - we have already discussed introducing 
>>>>>> layout constants in [4] anyway
>>>>>> * among the constants, users will find some that are ABI-specific 
>>>>>> (e.g. there will be one constant for x87 values, one for 
>>>>>> quadfloat, and so forth).
>>>>>> * testing 'is this layout a x87 layout?' reduces to an equality 
>>>>>> test (e.g. "layout == SYSV_X87")
>>>>>>
>>>>>> Something like this would allow us to have layouts which are 
>>>>>> _internally_ general enough to express system ABI specific types 
>>>>>> - but at the level of the public API, a layout for a 128-bit x87 
>>>>>> value would be the same as the one for a quad float - there would 
>>>>>> be no way for the user to tell them apart, other than noting that 
>>>>>> the two layouts correspond to different pre-baked constants. And 
>>>>>> this is, perhaps, a good outcome - after all, the distinction 
>>>>>> between these two layouts is a _semantic_ distinction, not 
>>>>>> (strictly speaking) a layout one (in fact the layout is, in terms 
>>>>>> of size and alignment, indeed the same in both cases). Therefore, 
>>>>>> it is very likely that this semantic distinction will only be of 
>>>>>> interest to very critical component of the Panama runtime - and 
>>>>>> that most of the clients will not care much about the distinction 
>>>>>> (other than maybe occasionally testing for "is this an x87 layout").
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>> [1] - 
>>>>>> https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats
>>>>>> [2] - 
>>>>>> https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format 
>>>>>> [3] - http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html
>>>>>> [4] - 
>>>>>> https://mail.openjdk.java.net/pipermail/panama-dev/2019-July/005908.html 
>>>>>>