[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sat Feb 2 22:43:39 UTC 2019


On 02/02/2019 23:02, John Rose wrote:
> On Feb 2, 2019, at 7:24 AM, Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> I'm wondering what's the meaning of 'calling a function with 
>> different endianness' - I thought endianess had to do with mismatches 
>> between in-memory representation and in-register representation. When 
>> you extract a 'long' from memory, you need to know if the layout is 
>> LE or BE because, depending on your platform, you might need byte 
>> swap. But when you pass the same 'long' to a function, the long is 
>> already in a register (thanks to the VM) and you are probably copying 
>> it into another register, or on the stack - but as is. In other 
>> words, I guess I don't see how endianness would come to play in 
>> function calls. I know that, in our annotations, we allow endianness 
>> to be specified, but I'm not sure it makes sense talk about passing 
>> parameter to functions using a different endiannes?
>
> That's my take also.  Endian-ness is a parameter that applies when 
> moving bytes
> out of memory into registers larger than a byte, and vice versa.  Once 
> a value
> is loaded into a register, there is little or no benefit to keeping 
> the byte structure.
> In a register there is a bit structure which is useful, and that can 
> play a role in
> addressing bitfields such as C defines.  Basically, when you load into 
> a register,
> you leave the bytes behind and work with bits instead; any residual 
> need to
> talk about bytes is handled by talking about bit positions and 
> bit-level layout,
> with offsets and sizes that are multiples of 8.
>
> The distinction between container and memory layout is an important one,
> and we don't need to rederive it.  Let's make use of it, as fully and 
> carefully
> documented in section 15 in this document:
> http://cr.openjdk.java.net/~jrose/panama/minimal-ldl.html#c

I agree with what you say here - this is essentially all an 'accident' 
that comes from the fact that we're using layout description to describe 
function arguments too - which I still think it's better than having two 
separate description mechanism.

I think I'm for defining that jextract will always use 'lower case' 
letters when generating annotations for function parameters (e.g. treat 
as if little endian) - but then the binder can dynamically convert this 
into a 'I don't care' endianness which never forces a byte swap.

In other words, we can make the descriptions tighter e.g. upper case 
letters such as I32 can only be used for structs layouts, not for 
function parameter layouts. And we have a precedent for similar 
restrictions too: sub-byte sizes are only allowed in the 'substructure' 
descriptor of a value layout (e.g. bitfields), but not to whole layout 
elements. So, there are cases where context is important - and 'do you 
appear in a function' seems another of those context-sensitive things 
that the binder needs to be aware of.

I like John's (3) solution - if Value::endianness  returns 
Optional<Endianness>, (4 is fine too), then I think we can repurpose the 
previous default constructors for Value layouts (the ones we removed 
because they did not have an endiannes) to mean "no endianness", which I 
think is consistent with what we're trying to model.

Summing up - I believe the bug is not in DirectNativeInvoker "ignoring" 
the endianness, but in UniversalInvoker taking it into account (because 
UniversalInvoker relies on Pointer API which relies on byte ordering). 
But the Pointers that Universalinvoker has to manipulate are, morally, 
sequence of longs that need to be passed either in registers or in stack 
slots - so I don' think we want to create those with endianness-ful 
layouts - and we should of course check that SystemABI throws whenever a 
NativeMethodType is passed where one of the layout has an explicit 
endianness (the same way as we now throw when we pass a NMT that 
contains an array carrier).

Maurizio

>
> This also is why memory layouts are not always an exact match for function
> arguments and returns.  When a function argument or return is passed in
> a register, byte order is irrelevant.  If we use memory layouts in a 
> punning
> way to also express containers, I see three options for dealing with the
> byte order in them:
>   1. have a portable convention (LE)
>   2. have a local convention (PE = BE or LE according to context)
>   3. make byte order a partial query, return sentinel value NE (no 
> endian polarity)
>   4. make byte order a partial query, throw an exception when queried
>
> I prefer 3 or 4, when dealing with containers.  Any dependency
> on byte order in a register is a probable bug, especially in case
> 2 (sorry, Henry, I disagree with your solution) where the dependency
> is on a value which can change depending on where the VM is run.
>
> FTR, the "Minimal LDL" document uses solution 1, but it also holds
> back, a little, from using memory layouts as function arguments and
> returns.  If we are going to overload memory layouts as register
> descriptions, 3 or 4 is a better move than 1.
>
> — John
>


More information about the panama-dev mailing list