<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi David,<br>
I agree that having to second-guess signedness is not fun.</p>
<p>However, I'd like to understand the problem more. I do not see in
SysV ABI any reference to the need zero/sign-extend arguments. Do
you have an example of an ABI with stricter requirements? The SO
post you show says something about clang zero/sign extending all
arguments that are smaller than 32 bits, but it's not clear to me
whether that's a standard, or just something that clang does.</p>
<p>There are several ways to address this issue that were discussed
in the past:</p>
<p>* add carriers for unsigned types (e.g. Unsigned<byte>) -
this will likely require Valhalla<br>
* add a sign property to value layouts. This is relatively
harmless. And will also allow Linker::canonicalLayouts to expand
the set of canonical layouts it reports (by including the unsigned
ones)<br>
* deal with this like clang does - e.g. as a Linker option that
can be added to function parameter/return types</p>
<p>Of these, I think my preferred option would be to add the
property to value layouts. This will turn out useful if, in the
future, we will allow the memory part of the FFM API to e.g. take
a JAVA_INT and turn it into a `long` (because we could inspect the
sign, and decide whether to zero or sign-extend).</p>
<p>Cheers<br>
Maurizio<br>
</p>
<div class="moz-cite-prefix">On 16/07/2024 23:47, David Lloyd wrote:<br>
</div>
<blockquote type="cite" cite="mid:CANghgrQbZxNgFvH4TB1kzUk6Eb__f7uZgrdg4MPqupVSGxcNjQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I have a
concern about signedness, calling conventions, and ABI when
making a downcall handle.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">The possible
value layouts mirror all of the Java types, which include
multiple types which are smaller than the typical minimum
value-passing integer size for common ABIs. My concern is that
there is no way to safely pass an unsigned byte value to a
function which accepts a `unsigned char` or other equivalent
single-byte unsigned value type without either potentially
having garbage sign bits in the upper part of the register, or
else having to know the ABI minimum integer size in order to
zero-extend ourselves.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">While it is
true in theory that most ABIs in use today are supposed to
ignore garbage bits outside of the range of a given type, in
practice that may not actually happen in all cases, resulting
in ABI incompatibility [1]. LLVM/Clang for example has a
specific attribute to indicate that an argument or return
value should be zero- or sign-extended for this reason,
despite not even having separate signed/unsigned types in its
IR [2].</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">In practice,
most platforms use 32-bit or 64-bit registers for integer
arguments. So, exploiting this knowledge, if I need to pass
unsigned arguments of fewer than 32 bits, I could use
`ValueLayout.JAVA_INT` and zero-extend my arguments in order
to satisfy this requirement. For 16-bit values, I can use
`ValueLayout.JAVA_CHAR`, so really this only applies to 8 bit
values. But for 32-bit unsigned values, this is more
difficult. If I'm on a 32-bit platform like ARM, I can use
`ValueLayout.JAVA_INT` knowing that the registers are already
32-bit and thus there are no potential garbage bits. But on
any 64-bit platform where 32-bit values are passed in
64-bit registers and garbage bits are not allowed, I would
need to know to use `ValueLayout.JAVA_LONG` and zero-extend
just to be sure to avoid garbage bits. Using `JAVA_LONG` on a
32-bit platform, on the other hand, would generally result in
an incorrect call due to arguments being pushed to wrong
registers.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I propose that
there should be additional `ValueLayout`s for unsigned 8 and
32 bit argument types, so that the zero/sign extension
mechanism (if any is needed) would be hidden from the user to
avoid these problems. The alternative is that the user must
try and guess the correct behavior based on the CPU type and
possibly the operating system as well, to infer the ABI rules
for the current platform. This strikes me as infeasible.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">It is unclear
to me whether similar care must be taken for structure
members, but I do not believe so. They appear to be
exactly-sized for the purposes of FFM.</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">[1] <a href="https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for" moz-do-not-send="true" class="moz-txt-link-freetext">https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for</a></div>
<div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif">[2] <a href="https://llvm.org/docs/LangRef.html#parameter-attributes" moz-do-not-send="true" class="moz-txt-link-freetext">https://llvm.org/docs/LangRef.html#parameter-attributes</a></div>
<br>
</div>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">- DML • he/him<br>
</div>
</div>
</div>
</blockquote>
</body>
</html>