Concerns about signedness and ABI
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Jul 17 09:41:35 UTC 2024
Hi David,
I agree that having to second-guess signedness is not fun.
However, I'd like to understand the problem more. I do not see in SysV
ABI any reference to the need zero/sign-extend arguments. Do you have an
example of an ABI with stricter requirements? The SO post you show says
something about clang zero/sign extending all arguments that are smaller
than 32 bits, but it's not clear to me whether that's a standard, or
just something that clang does.
There are several ways to address this issue that were discussed in the
past:
* add carriers for unsigned types (e.g. Unsigned<byte>) - this will
likely require Valhalla
* add a sign property to value layouts. This is relatively harmless. And
will also allow Linker::canonicalLayouts to expand the set of canonical
layouts it reports (by including the unsigned ones)
* deal with this like clang does - e.g. as a Linker option that can be
added to function parameter/return types
Of these, I think my preferred option would be to add the property to
value layouts. This will turn out useful if, in the future, we will
allow the memory part of the FFM API to e.g. take a JAVA_INT and turn it
into a `long` (because we could inspect the sign, and decide whether to
zero or sign-extend).
Cheers
Maurizio
On 16/07/2024 23:47, David Lloyd wrote:
> I have a concern about signedness, calling conventions, and ABI when
> making a downcall handle.
>
> The possible value layouts mirror all of the Java types, which include
> multiple types which are smaller than the typical minimum
> value-passing integer size for common ABIs. My concern is that there
> is no way to safely pass an unsigned byte value to a function which
> accepts a `unsigned char` or other equivalent single-byte unsigned
> value type without either potentially having garbage sign bits in the
> upper part of the register, or else having to know the ABI minimum
> integer size in order to zero-extend ourselves.
>
> While it is true in theory that most ABIs in use today are supposed to
> ignore garbage bits outside of the range of a given type, in practice
> that may not actually happen in all cases, resulting in ABI
> incompatibility [1]. LLVM/Clang for example has a specific attribute
> to indicate that an argument or return value should be zero- or
> sign-extended for this reason, despite not even having separate
> signed/unsigned types in its IR [2].
>
> In practice, most platforms use 32-bit or 64-bit registers for integer
> arguments. So, exploiting this knowledge, if I need to pass unsigned
> arguments of fewer than 32 bits, I could use `ValueLayout.JAVA_INT`
> and zero-extend my arguments in order to satisfy this requirement. For
> 16-bit values, I can use `ValueLayout.JAVA_CHAR`, so really this only
> applies to 8 bit values. But for 32-bit unsigned values, this is more
> difficult. If I'm on a 32-bit platform like ARM, I can use
> `ValueLayout.JAVA_INT` knowing that the registers are already 32-bit
> and thus there are no potential garbage bits. But on any 64-bit
> platform where 32-bit values are passed in 64-bit registers and
> garbage bits are not allowed, I would need to know to use
> `ValueLayout.JAVA_LONG` and zero-extend just to be sure to avoid
> garbage bits. Using `JAVA_LONG` on a 32-bit platform, on the other
> hand, would generally result in an incorrect call due to arguments
> being pushed to wrong registers.
>
> I propose that there should be additional `ValueLayout`s for unsigned
> 8 and 32 bit argument types, so that the zero/sign extension mechanism
> (if any is needed) would be hidden from the user to avoid these
> problems. The alternative is that the user must try and guess the
> correct behavior based on the CPU type and possibly the operating
> system as well, to infer the ABI rules for the current platform. This
> strikes me as infeasible.
>
> It is unclear to me whether similar care must be taken for structure
> members, but I do not believe so. They appear to be exactly-sized for
> the purposes of FFM.
>
> [1]
> https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for
> [2] https://llvm.org/docs/LangRef.html#parameter-attributes
>
> --
> - DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240717/b15babb0/attachment-0001.htm>
More information about the panama-dev
mailing list