[foreign] RFR 8218153: Read/Write of Value layout should honor declared endianness

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sun Feb 3 20:04:05 UTC 2019


So, all things considered, it is true that we have an issue here - e.g. 
the lack of a way to express 'lack' of endianness - both in the API and 
in the layout description. We can come up with ways to make up for this 
deficiency (as I tried to do in my last email), but I think that, 
ultimately, that could prove to be confusing. Below is a proposal (which 
I also discussed offline with John) which I think has some legs:

* So, API-wise, I think we need to make things explicit - each layout 
should either say if it's LE, BE or NE (short for 'no endianness'). 
Whether we represent NE as an expicit constant in the Endianness enum, 
or just as a lack of endianness, that's mostly an API decision. But, 
logically, we have three possible endianness values.

* Any attempt to construct a function descriptor with the API by passing 
one or more layout argument/return whose endianness is other than NE 
will result in IllegalArgumentException

* Any attempt to access memory using a layout whose endianness is NE 
should also result in an exception

* We should provide user-friendly API to swap endianness of a Layout

* We are now left with the problem of encoding this new schema in the 
layout language we use in annotations. As John suggests, the endianness 
concerns would be better captured by the presence of an explicit byte 
swap operator. Let's try to sketch how it might work.

* a byte swap operator is either '<' or '>', where the former denotes 
little endianness (LE in short), the latter big endianness (BE in short).

* parsing layout becomes a contextual operation - that is, the output of 
parsing depends on the current endianness state

* when parsing starts, initial endianness state is set to NE

* whenever a '<' or a '>' is encountered, endianness state is set to 
either LE or BE, respectively. The endianness state is reverted to its 
original value, *after* the layout element following the '<' or '>' 
character has been fully parsed. For groups, this means that:

 >[ i32 i32 ]

will be effectively the same as

[ >i32 >i32 ]

allowing for a nice shortcut.

* whenever an address layout of the kind L:U is parsed, the endianness 
state is reset to NE just before U is parsed, and reset to the previous 
value after U has been parsed. This means that

 >u64:i32

will be a BE pointer to a 32-bit integral value whose endianness is 
unspecified.


Taken together, these rules allow us to specify NE in any place by 
simply avoiding the use of a '<' or a '>' sign. In some cases, to do 
that it might be needed to rewrite some of the containing layout - 
consider this case:

 >[i32 i32]

If the user really wanted the second field to be NE, then the right 
string should be:

[>i32 i32]


I quite like this because it has a nice correspondence between layout 
API and description: in both cases there's no explicit support for NE - 
which is encoded always as a lack of explicit endianness info.

Maurizio


On 03/02/2019 19:49, John Rose wrote:
> On Feb 2, 2019, at 7:13 PM, Henry Jen <henry.jen at oracle.com 
> <mailto:henry.jen at oracle.com>> wrote:
>>
>> Now, the union is a host endian value, where in the other fields, we 
>> want to store the number in memory as BIG ENDIAN.
>
> The old minimal-ldl proposal has a contextual byte-swapping operator.
> Something like that would do the job.  It could take the form of an
> endian marker that appears at the beginning of a layout, or on a
> group, or just before an individual value (although that functionality
> can be covered by using different characters for different endian-ness).
>
> (Random idea:  What if layout letters were NE by default, and only got
> BE or LE if they were in the context of an endian prefix?  Then layout
> strings could produce NE variables which could be transformed later.
> That kind of gets at your use case for HE, without baking it into the
> layout strings.  You'd read a bunch of NE layouts and then apply the
> contextual setting afterwards.)
>
>> For jextract, there is no way it would know that the ns/nl/nll field 
>> is to be stored BIG ENDIAN without extra directive, it only knows it 
>> a C type, so what do we generate? I believe this should be “not 
>> specified” situation. Let’s assume we use LE as suggest.
>>
>> In case we doing it manually, we can specify ns/nl/nll to be BIG 
>> ENDIAN with uppercase U. However, we won’t be able to distinguish 
>> “LE” and “not specified”.
>


More information about the panama-dev mailing list