[foreign-memaccess+abi] RFR: 8303017: Passing by-value structs whose size is not power of 2 doesn't work on all platforms
Maurizio Cimadamore
mcimadamore at openjdk.org
Thu Mar 2 11:04:45 UTC 2023
On Wed, 1 Mar 2023 21:28:24 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> I think it's correct, but I will try to verify BE as well in the test (somehow).
>>
>> My understanding is this:
>>
>> For LE, we have a struct with 3 shorts in memory like so:
>>
>> lo1, hi1, lo2, hi2, lo3, hi3
>> 0 2 4 6
>>
>> The native compiler does an oversized load of eight bytes, so we also load the trailing 2 bytes (i.e. byte 6 - 8). And the resulting register looks like this (using `000` to represent junk bytes):
>>
>> 000, 000, hi3, lo3, hi2, lo2, hi1, lo1
>> 8 6 4 2 0
>>
>> So, we first load an `int` from bytes 0 - 4, and get `000, 000, 000, 000, hi2, lo2, hi1, lo1`, then we load bytes 4 - 6 with shift amount = 4: `000, 000, hi3, lo3, 000, 000, 000, 000` and combine the two to get: `000, 000, hi3, lo3, hi2, lo2, hi1, lo1` (just like we would get from an oversized load).
>>
>> ---
>>
>> For BE, we have in memory:
>>
>> hi1, lo1, hi2, lo2, hi3, lo3
>> 0 2 4 6
>>
>> We load byte 2 - 6 using an `int` and get: `000, 000, 000, 000, hi2, lo2, hi3, lo3` (no flipping this time), then we load byte 0 - 2 using a `short` and shift amount = 4 and get `000, 000, hi1, lo1, 000, 000, 000, 000`. Then combined: `000, 000, hi1, lo1, hi2, lo2, hi3, lo3`.
>>
>> i.e., the order of the shorts within the register is flipped on BE, compare to LE. I think this is correct since when I have code like this:
>>
>>
>> struct Foo {
>> short f0;
>> short f1;
>> short f2;
>> };
>>
>> short func(struct Foo f) {
>> return f.f2;
>> }
>>
>>
>> The compiler generates: [`extsh 3, 3`](https://www.ibm.com/docs/en/aix/7.3?topic=set-extsh-exts-extend-sign-halfword-instruction) which grabs to lower 16 bits of the register (which contains `hi3, lo3`). For returning `f0` and `f1` the compiler generates 32 and 16 bit shifts respectively before the `extsh 3, 3`. (assuming I'm reading PPC disassembly correctly :)). For LE it's the opposite: https://godbolt.org/z/eMeGKqnhs
>>
>> But, if we flipped the shift amount as well, we'd get `000, 000, hi2, lo2, hi3, lo3, 000, 000` on the first load, and combined we'd get `000, 000, hi2, lo2, hi3, lo3, hi1, lo1`, which doesn't seem right, since the order of the shorts is mixed.
>
> I also realized that we could beef up the test by letting the native code (which we assume to be correctly compiled) pass each individual byte to the callback as well. That way we can also test whether the order of the elements is correct.
I've been trying to read the PPC ABI [1], which has some useful diagrams. I found this interesting:
struct {
char c;
char d;
short s;
int n;
};
word aligned, sizeof is 8
little endian:
+-------+-------+-------+-------+
| 2| 1| 0|
| s | d | c |
+-------+-------+-------+-------+
| 4|
| n |
+-------+-------+-------+-------+
big endian:
+-------+-------+-------+-------+
|0 |1 |2 |
| c | d | s |
+-------+-------+-------+-------+
|4 |
| n |
+-------+-------+-------+-------+
(I think in these diagrams, on the left you always have MSB and on the right the LSB - as explained in an earlier section. The numbers indicate the byte offsets - I believe as laid out in memory - of the struct fields).
This seems to confirm your expectations - after a load, struct fields are "flipped" in little-endian, and are not flipped in big-endian. Your implementation takes that into account and avoids a "double flip" (which is what I was proposing as I did not immediately get the difference in the register layout).
[1] - https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.4.1.txt
-------------
PR: https://git.openjdk.org/panama-foreign/pull/806
More information about the panama-dev
mailing list