notes on varargs

Thu May 3 01:34:32 UTC 2018

Varargs is a tricky feature of C, and if you look under the hood at the ABI
and machine code you'll find lots of strange moves.  We have to emulate
those moves adequately, both in callouts (JVM sending args through '…'
prototypes) and callbacks (JVM receiving arguments via a va_list).

The first thing that happens with a varargs call is the C compiler treats
each argument passed under ellipsis '…' as if it were an argument to
a function without a prototype.  This means integral values smaller than
int are promoted to int, and floats are promoted to doubles.  Pointers
may be viewed as being converted to void* or intptr_t.  Array and function
types decay to pointers, as usual.  These are sometimes called the
"default argument conversions".

The machine code, as dictated by the ABI (if it is a typical one) further
widens int values to intptr_t, which is to say machine words.  Since pointers
are already machine words, at this point everything is either an intptr_t,
a long (if that is larger than an intptr_t, true only on 32-bit systems),
or a double, unless there was something odd like a struct or vector
in the argument list.

This second widening by machine code does not necessarily preserve
values.  Some ABIs specify zero and/or sign extension for small values
packed in intptr_t.  But others allow garbage padding which leaves the
choice to the compiler to sign extend, zero extend, or do something else.
Still, in typical ABIs, if you were to cast the transmitted intptr_t value back
to the original type, you'd get the original value back.

(Oddball ABIs might specify that sub-word values are packed in the
arithmetically high bits, in which case the cast would have to be preceded
by a shift; hopefully we don't have to deal with that any time soon.)

All this means that, on 64-bit systems, a varargs list can be packed into
a sequence of intptr_t values.  In fact, that is usually much of what the
ABI requires, although ABIs usually add more details (read on!).

Therefore, for Java to model an outgoing varargs list, a long[] array
is usually adequate.  Equivalently, a signature-polymorphic method
which takes a variable sequence of long arguments is also usually
adequate, for arities supported by the JVM (about 255 or less).

(Some atypical ABIs use different formats for function pointers and data
pointers, so there's another portability hazard that function pointers might
not be converted to intptr_t, but rather something like a pair of such values,
perhaps representing a library control block and an entry point therein.
But few or none of the OpenJDK platforms do this, I think.)

Many ABIs transmit floating point values in distinct argument registers
from other types (pointers and integers).  If there is a C function prototype,
there is no ambiguity as to which kind of register each argument goes in,
but in the case of varargs (and unprototyped functions) an ABI may elect
either to place an outgoing double value into a non-float register, or
else it may elect to place the value in a floating point register.  In the
latter case, Java may need a short double[] array to model outgoing
floating point register values, which is organized separately from the
previously mentioned long[] array.

(The same thing may happen if the platform has other kinds of registers
besides general-purpose and floating-point.  It might happen with
vector registers, for example, if the ABI requires vectors to be passed
in vector registers.  It will almost certainly require this if the vector
matches an argument in a function prototype; for varargs the story
can vary from ABI to ABI.  As a typical example, some ABIs mix
structs and vectors into the sequence of intptr_t values by converting
them into sub-sequences of one or more values, derived as if by
loading the composite value from memory as successive machine words.
In a case like this, Java can use a similar trick as with the double[]
array; it can model the outgoing values stored in vector registers
using a long[] array or (once we get value types) an array of a
suitably sized value type.  In the case of a long[] array the size
depends on the product of the number of vector registers times
the size of a single vector in 64-bit units.  We don't need to cover
these cases now, but we may want to cover them later.)

Unlike the ABIs, the C standards may leave argument passing behavior
undefined for types other than floats, ints, and pointers (and pointer
decay products).  This afftects C structs and platform-specific vector
types.  For those types the compiler writer may consult the ABIs for
direction, and ask what (if anything) the ABI prescribes for passing
those types when a function prototype is missing.

In the end, the C compiler will usually treat varargs-controlled arguments
as if they were passed to an unprototyped function, and the ABI will
usually assign the first few to registers and the rest to a contiguous
area at or near the top of the stack.  For varargs, we can call this
contiguous area the "argument dump area", since it amounts to
a spot where the va_arg macro can start picking up the arguments.

Thus, when the C compiler creates a call to a variable arity function
(or an unprototyped one) it marshals a bunch of intptr_t values into
a few registers and the top of the stack.  Then it performs a function
call, transferring control to the entry point of the function.

To access the varargs value, the callee first executes a va_start
macro.  This is a C compiler intrinsic which sets up one or more
pointers (in a va_list struct).  The main pointer is to the start of
the argument dump area, as defined by the ABI.  (Often that is
near the place on the stack where the return address is stored.)
The effect of the va_start macros also marshals any stray values
in argument registers into memory, so that the va_arg macro
can "see" them.  If there are several kinds of registers, there
may be extra register dump areas to hold the register values
for easy access.  It is up to the callee to finish the job of the
caller in setting up all dump areas as needed by the va_arg
macro.

(Some C compilers, especially early ones, use push instructions
to push the arguments in right-to-left order, and then the call
instruction pushes the return address.  This leads to the state
of affairs just described as typical.  On RISC machines, arguments
are stored directly to memory by dead reckoning.  The result
is the same.)

The placement of the argument dump is where ABIs often get clever.
A typical ABI may require extra uninitialized space in the main dump
area, so that the callee can spill register values into the uninitialized
space.  After doing this, the callee can conveniently find all the
arguments in one contiguous area of memory.  In the simplest
case, all of the arguments might end up stored in memory in
a single intptr_t array.  Some ABIs ensure that this is true for
all types of arguments.  The reward of such regularity is the
simplicity of the va_arg macro and the va_list type:

   typedef intptr_t* va_list;
   #define va_arg(ap, T) (*(T*)_post_inc(va_list, _va_sizeof(T))
   #define _post_inc(v, n) (((v) += (n)) - (n))
   #define _va_sizeof(T) ( sizeof(union { intptr_t w; T t; }) / sizeof(intptr_t) ) 

The effect of this macrology is that va_arg(ap, T) turns into *(T*)ap++
when T is no larger than a single machine word (which is typical).
In this case, scanning over an argument list is just a thinly veiled walk
down an array of intptr_t values, stored in argument order.

In more complicated cases, where the ABI mandates extra dump
areas, the va_arg macro may expand to control flow, which pops
the argument out of the appropriate place.  Here's pseudo-code
for a hypothetical ABI that uses a separate dump area for a
limited number of :

    T va_arg(ap, T) {
      if (T == double && ap.fptr - ap.fbase < MAX_FP_ARG_REGS)
        return *ap.fptr++;  // return value from float dump area
      T* p = (T*) ap.sptr;
      ap.sptr += _va_sizeof(T);
      return *p;
    }

For Java to perform a call to a varargs C function, it should
marshal the machine words (as longs) for the various arguments.
If there are extra argument register types, it should marshal those
also, and keep track how many there are of each kind.  The
assembly code created by the binder needs to ensure that
each value is transmitted in the location (register or memory)
which is mandated by the ABI.  This procedure is about the
same as for executing a regular non-varargs call from Java.

One difference is that each different combination of arguments
needs different code.  If the JIT cannot choose this code statically,
an interpreter will need to dynamically marshal the arguments
into some staging area, and then run assembly code that will
copy the contents of the staging area onto the top of the
stack and/or memory as needed.

Another smaller difference between varargs and non-varargs
calls is (as noted above) the fact that the ABI requires 32-bit
values to be widened to the machine word size.  This may fall
out by accident as Java models native values as longs.

In general, each Java carrier value must be examined to see if it can
be transmitted using one of the standard types of default argument
conversions.  If it can be converted to a pointer, the conversion
should be done.  (This covers the String type, if that is convertible
to char*.)  If it is a floating point value, it should be converted to
double and stored bitwise into the appropriate array element (either
long or double) which models the argument.  If it is an integral type
(non-floating primitive) it needs to be packed into a long.  If it is
a struct or vector type, it may need to be packed into several longs
(or several doubles, if the ABI demands packing into floating-point
registers).

Java values carry their own type information as needed.  Passing a mix
of pointers and integers and floats through a varargs Java API can be
modeled as a builder API which appends values of various C-level types
to a growing marshalling area.  It can also be modeled as a scan over
an array of Object references, asking each value what its C-level type
should be, when viewed as an input for the default argument
conversions.  The latter approach can (and should) be built on top of
the former.

When Java originates a varargs call, it is a good idea to quietly add
a one or two extra values to the argument list even though the callee
function won't (or shouldn't) see them.  The performance cost of this
is small.  Adding a NULL simplifies interoperation with certain
varargs APIs that use a NULL value as a sentinel for the end of the
argument array.

For debugging support, storing a second extra argument with an unusual
value (such as 0xCAFEBABE?) will help diagnose low-level bugs.  The
NULL will also help of course.  Usually argument dump locations are
tightly constrained by the ABI, but it is always safe to add an extra
argument at the end.

For callbacks from C to Java, varargs is easier to handle, although it
is not trivial.  The native code which sets up the callback needs to
perform the same actions as the va_start macro, and then pass the
resulting va_list up to Java.  Since the size of va_list is
ABI-dependent, and some ABIs may object to taking a bitwise copy of
it, it is much safer to pass a pointer to it to Java.

To use this pointer, Java needs standard native functions which can
extract the various typed arguments, given a pointer to a va_list and
the desired type.  For full portability there needs to be one such
function for each fundamentally different type.  But the binder can
consult the ABI and use more generic accesses; in practical cases the
va_arg macro can be wrapped in a function like this:

    long va_arg_function(va_list* ap, int kind, void* buffer) {
      switch (kind) {
      case K_DOUBLE:  // might be special fp reg dump
        return double_to_long_bits_raw(va_arg(*ap, double));
      case K_PTR:
       //return (long) (intptr_t) va_arg(*ap, void*);  // not needed usually
      case K_INT32:
       //return va_arg(*ap, int32_t);  // not needed usually
      case K_INT64:
        return (long) va_arg(*ap, intptr_t);
      case K_VEC128:
        *(vec128_t*)buffer = va_arg(*ap, vec128_t);
        return (long) (intptr_t) buffer;
        // and so on for each vector type in the ABI
      }
    }

Rather than having a va_arg function for every possible type (which is
probably impossible), this one function takes an enumeration token
that distinguishes among the basic types that the ABI recognizes.  If
the ABI is simple enough, all cases can be handled by just one kind of
access, K_INT64 (for untyped intptr_t).

Note that this approach allows each set of arguments to be walked only
once, since the original copy of the va_list is mutated by every call
to va_arg_function.  This is easy to work around given a second
function which wraps the va_copy macro, to make a snapshot of the
original va_list.

The above account shows how ABIs inform the passing of arguments,
including varargs arguments, but it doesn't define the specific rules for
any particular ABI.  A write-up for any particular ABI we support, specifying
the rules exactly, would probably take as much space as this note which
purports to summarize all of them.

There are a number of ways to get this information.  The most direct
is to compile test cases on your favorite platform and observe their
machine code and/or behavior.  Yes, that's more direct than the official
method, which is tracking down an ABI document and reading it to
find the necessary rules (both steps are usually difficult).  Reading
the code of a compiler backend is sometimes useful.  Even better,
reading the code of the excellent libffi library, which exists only to
perform calling sequences on many ABIs, would probably teach
us all we need to know quickly, for any given ABI.

Here's a final idea.  It would be good to create a small test suite
that infers the argument passing rules by compiling and running
C code.  Such suites are trivial to write if the only goal is to detect
sizes and alignments of ABI-defined types.  Probably with a bit
of work we could write C code that would discover more subtle
things, such as how many argument registers there are and which
types go with which kinds.  Of course, if the ABI allows the va_list
to be a single pointer to a seamless array of intptr_t value, then
there won't be much anything to discover, but code which stores
long and loads doubles through va_lists will be able to tell when
if the array isn't seamless.

I'd appreciate any other pointers to ABI information relevant to
arguments and especially va_list processing.  I know some people
on this list have looked into this in depth.

— John