[foreign-memaccess+abi] RFR: 8303017: Passing by-value structs whose size is not power of 2 doesn't work on all platforms
Jorn Vernee
jvernee at openjdk.org
Wed Mar 1 21:27:32 UTC 2023
On Wed, 1 Mar 2023 18:14:25 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> Fix passing of by-value structs whose size is not a power of 2.
>>
>> The issue is that currently we try to do loads using a ValueLayout that is the size of the nearest power of two that can fit the struct (or part of it, if it is passed in multiple registers). For instance, for a struct of size 6, we try to load its value using `ValueLayout.OfLong`, which is size 8, and thus produce an out of bounds error. A similar issue applies to writes.
>>
>> For the solution I've implemented in this patch, I've attached an explicit byte width to BufferLoads/Stores, which indicates the size of the value we want to load/store. The type that is produced by the binding is still the same. For example, loading a struct of size 6 is implemented as an `int` load and a `short` load, which are then combined into a `long`, instead of attempting to do a single `long` load (and failing). This allows us to avoid doing an out of bounds access.
>>
>> I've added a new test that tests a bunch of structs with varying byte sizes being passed in registers and on the stack. Using a nested `char[]` to precisely tweak the byte size of each struct.
>
> src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 596:
>
>> 594: int chunkSize = Integer.highestOneBit(remaining); // next power of 2, in bytes
>> 595: long writeOffset = offset() + SharedUtils.pickChunkOffset(chunkOffset, byteWidth(), chunkSize);
>> 596: int shiftAmount = chunkOffset * Byte.SIZE;
>
> Is this correct? E.g. assume we have to write 3 shorts { 13, 14, 15 }, and we do that in two chunks. If we are big endian, I think we first write { 14, 15 } then { 13 }. When we write { 14, 15 }, Isn't the offset of the mask 2 bytes (e.g. 16 bits) ? In other words shouldn't `chunkOffset` here also be decorated with `pinkChunkOffset` to take endianness into account? (it seems to me that the shift amount should be equal to `writeOffset - offset()`).
I think it's correct, but I will try to verify BE as well in the test (somehow).
My understanding is this:
For LE, we have a struct with 3 shorts in memory like so:
lo1, hi1, lo2, hi2, lo3, hi3
0 2 4 6
The native compiler does an oversized load of eight bytes, so we also load the trailing 2 bytes (i.e. byte 6 - 8). And the resulting register looks like this (using `000` to represent junk bytes):
000, 000, hi3, lo3, hi2, lo2, hi1, lo1
8 6 4 2 0
So, we first load an `int` from bytes 0 - 4, and get `000, 000, 000, 000, hi2, lo2, hi1, lo1`, then we load bytes 4 - 6 with shift amount = 4: `000, 000, hi3, lo3, 000, 000, 000, 000` and combine the two to get: `000, 000, hi3, lo3, hi2, lo2, hi1, lo1` (just like we would get from an oversized load).
---
For BE, we have in memory:
hi1, lo1, hi2, lo2, hi3, lo3
0 2 4 6
We load byte 2 - 6 which using an `int` and get: `000, 000, 000, 000, hi2, lo2, hi3, lo3` (no flipping this time), then we load byte 0 - 2 using a `short` and shift amount = 4 and get `000, 000, hi1, lo1, 000, 000, 000, 000`. Then combined: `000, 000, hi1, lo1, hi2, lo2, hi3, lo3`.
i.e., the order of the shorts within the register is flipped on BE, compare to LE. I think this is correct since when I have code like this:
struct Foo {
short f0;
short f1;
short f2;
};
short func(struct Foo f) {
return f.f2;
}
The compiler generates: [`extsh 3, 3`](https://www.ibm.com/docs/en/aix/7.3?topic=set-extsh-exts-extend-sign-halfword-instruction) which grabs to lower 16 bits of the register (which contains `hi3, lo3`). (for returning `f0` and `f1` the compiler generates 32 and 16 bit shifts respectively. Well, assuming I'm reading PPC disassembly correctly :))
But, if we flipped the shift amount as well, we'd get `000, 000, hi2, lo2, hi3, lo3, 000, 000` on the first load, and combined we'd get `000, 000, hi2, lo2, hi3, lo3, hi1, lo1`, which doesn't seem right, since the order of the shorts is mixed.
-------------
PR: https://git.openjdk.org/panama-foreign/pull/806
More information about the panama-dev
mailing list