Concerns about signedness and ABI

David Lloyd david.lloyd at redhat.com
Tue Jul 16 22:47:18 UTC 2024


I have a concern about signedness, calling conventions, and ABI when making
a downcall handle.

The possible value layouts mirror all of the Java types, which include
multiple types which are smaller than the typical minimum value-passing
integer size for common ABIs. My concern is that there is no way to safely
pass an unsigned byte value to a function which accepts a `unsigned char`
or other equivalent single-byte unsigned value type without either
potentially having garbage sign bits in the upper part of the register, or
else having to know the ABI minimum integer size in order to zero-extend
ourselves.

While it is true in theory that most ABIs in use today are supposed to
ignore garbage bits outside of the range of a given type, in practice that
may not actually happen in all cases, resulting in ABI incompatibility [1].
LLVM/Clang for example has a specific attribute to indicate that an
argument or return value should be zero- or sign-extended for this reason,
despite not even having separate signed/unsigned types in its IR [2].

In practice, most platforms use 32-bit or 64-bit registers for integer
arguments. So, exploiting this knowledge, if I need to pass unsigned
arguments of fewer than 32 bits, I could use `ValueLayout.JAVA_INT` and
zero-extend my arguments in order to satisfy this requirement. For 16-bit
values, I can use `ValueLayout.JAVA_CHAR`, so really this only applies to 8
bit values. But for 32-bit unsigned values, this is more difficult. If I'm
on a 32-bit platform like ARM, I can use `ValueLayout.JAVA_INT` knowing
that the registers are already 32-bit and thus there are no potential
garbage bits. But on any 64-bit platform where 32-bit values are passed in
64-bit registers and garbage bits are not allowed, I would need to know to
use `ValueLayout.JAVA_LONG` and zero-extend just to be sure to avoid
garbage bits. Using `JAVA_LONG` on a 32-bit platform, on the other hand,
would generally result in an incorrect call due to arguments being pushed
to wrong registers.

I propose that there should be additional `ValueLayout`s for unsigned 8 and
32 bit argument types, so that the zero/sign extension mechanism (if any is
needed) would be hidden from the user to avoid these problems. The
alternative is that the user must try and guess the correct behavior based
on the CPU type and possibly the operating system as well, to infer the ABI
rules for the current platform. This strikes me as infeasible.

It is unclear to me whether similar care must be taken for structure
members, but I do not believe so. They appear to be exactly-sized for the
purposes of FFM.

[1]
https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for
[2] https://llvm.org/docs/LangRef.html#parameter-attributes

-- 
- DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240716/a39b8208/attachment.htm>


More information about the panama-dev mailing list