[foreign-memaccess+abi] RFR: 8303017: Passing by-value structs whose size is not power of 2 doesn't work on all platforms

Jorn Vernee jvernee at openjdk.org
Wed Mar 1 21:34:23 UTC 2023


On Wed, 1 Mar 2023 21:24:40 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 596:
>> 
>>> 594:                     int chunkSize = Integer.highestOneBit(remaining); // next power of 2, in bytes
>>> 595:                     long writeOffset = offset() + SharedUtils.pickChunkOffset(chunkOffset, byteWidth(), chunkSize);
>>> 596:                     int shiftAmount = chunkOffset * Byte.SIZE;
>> 
>> Is this correct? E.g. assume we have to write 3 shorts { 13, 14, 15 }, and we do that in two chunks. If we are big endian, I think we first write { 14, 15 } then { 13 }. When we write { 14, 15 }, Isn't the offset of the mask 2 bytes (e.g. 16 bits) ? In other words shouldn't `chunkOffset` here also be decorated with `pinkChunkOffset` to take endianness into account? (it seems to me that the shift amount should be equal to `writeOffset - offset()`).
>
> I think it's correct, but I will try to verify BE as well in the test (somehow).
> 
> My understanding is this:
> 
> For LE, we have a struct with 3 shorts in memory like so:
> 
>     lo1, hi1, lo2, hi2, lo3, hi3
>     0         2         4         6
> 
> The native compiler does an oversized load of eight bytes, so we also load the trailing 2 bytes (i.e. byte 6 - 8). And the resulting register looks like this (using `000` to represent junk bytes):
> 
>     000, 000, hi3, lo3, hi2, lo2, hi1, lo1
>     8         6         4         2         0
> 
> So, we first load an `int` from bytes 0 - 4, and get `000, 000, 000, 000, hi2, lo2, hi1, lo1`, then we load bytes 4 - 6 with shift amount = 4: `000, 000, hi3, lo3, 000, 000, 000, 000` and combine the two to get: `000, 000, hi3, lo3, hi2, lo2, hi1, lo1` (just like we would get from an oversized load).
> 
> ---
> 
> For BE, we have in memory:
>    
>     hi1, lo1, hi2, lo2, hi3, lo3
>     0         2         4         6
> 
> We load byte 2 - 6 using an `int` and get: `000, 000, 000, 000, hi2, lo2, hi3, lo3` (no flipping this time), then we load byte 0 - 2 using a `short` and shift amount = 4 and get `000, 000, hi1, lo1, 000, 000, 000, 000`. Then combined: `000, 000, hi1, lo1, hi2, lo2, hi3, lo3`.
> 
> i.e., the order of the shorts within the register is flipped on BE, compare to LE. I think this is correct since when I have code like this:
> 
> 
> struct Foo {
>     short f0;
>     short f1;
>     short f2;
> };
> 
> short func(struct Foo f) {
>     return f.f2;
> }
> 
> 
> The compiler generates: [`extsh 3, 3`](https://www.ibm.com/docs/en/aix/7.3?topic=set-extsh-exts-extend-sign-halfword-instruction) which grabs to lower 16 bits of the register (which contains `hi3, lo3`). (for returning `f0` and `f1` the compiler generates 32 and 16 bit shifts respectively before the `extsh 3, 3`. Well, assuming I'm reading PPC disassembly correctly :))
> 
> But, if we flipped the shift amount as well, we'd get  `000, 000, hi2, lo2, hi3, lo3, 000, 000` on the first load, and combined we'd get `000, 000, hi2, lo2, hi3, lo3, hi1, lo1`, which doesn't seem right, since the order of the shorts is mixed.

I also realized that we could beef up the test by letting the native code (which we assume to be correctly compiled) pass each individual byte to the callback as well. That way we can also test whether the order of the elements is correct.

-------------

PR: https://git.openjdk.org/panama-foreign/pull/806


More information about the panama-dev mailing list