notes on varargs

Thu May 3 11:16:36 UTC 2018

One of the goals of JavaCPP is to figure out which features of C/C++ are
actually important to use C/C++ libraries, and it turns out that varargs is
not something that matters a great deal. Besides, we can use the native C++
compiler to basically just do the hard work for us when required. Again,
why are we trying to avoid it?

On the other hand, the ability to call inline functions and function-like
macros is indispensable for many actual C/C++ libraries, but neither JNA
nor JNR offer any support whatsoever for them. I hope Panama will not make
the same mistake!

Samuel

2018年5月3日(木) 10:35 John Rose <john.r.rose at oracle.com>:

> Varargs is a tricky feature of C, and if you look under the hood at the ABI
> and machine code you'll find lots of strange moves.  We have to emulate
> those moves adequately, both in callouts (JVM sending args through '…'
> prototypes) and callbacks (JVM receiving arguments via a va_list).
>
> The first thing that happens with a varargs call is the C compiler treats
> each argument passed under ellipsis '…' as if it were an argument to
> a function without a prototype.  This means integral values smaller than
> int are promoted to int, and floats are promoted to doubles.  Pointers
> may be viewed as being converted to void* or intptr_t.  Array and function
> types decay to pointers, as usual.  These are sometimes called the
> "default argument conversions".
>
> The machine code, as dictated by the ABI (if it is a typical one) further
> widens int values to intptr_t, which is to say machine words.  Since
> pointers
> are already machine words, at this point everything is either an intptr_t,
> a long (if that is larger than an intptr_t, true only on 32-bit systems),
> or a double, unless there was something odd like a struct or vector
> in the argument list.
>
> This second widening by machine code does not necessarily preserve
> values.  Some ABIs specify zero and/or sign extension for small values
> packed in intptr_t.  But others allow garbage padding which leaves the
> choice to the compiler to sign extend, zero extend, or do something else.
> Still, in typical ABIs, if you were to cast the transmitted intptr_t value
> back
> to the original type, you'd get the original value back.
>
> (Oddball ABIs might specify that sub-word values are packed in the
> arithmetically high bits, in which case the cast would have to be preceded
> by a shift; hopefully we don't have to deal with that any time soon.)
>
> All this means that, on 64-bit systems, a varargs list can be packed into
> a sequence of intptr_t values.  In fact, that is usually much of what the
> ABI requires, although ABIs usually add more details (read on!).
>
> Therefore, for Java to model an outgoing varargs list, a long[] array
> is usually adequate.  Equivalently, a signature-polymorphic method
> which takes a variable sequence of long arguments is also usually
> adequate, for arities supported by the JVM (about 255 or less).
>
> (Some atypical ABIs use different formats for function pointers and data
> pointers, so there's another portability hazard that function pointers
> might
> not be converted to intptr_t, but rather something like a pair of such
> values,
> perhaps representing a library control block and an entry point therein.
> But few or none of the OpenJDK platforms do this, I think.)
>
> Many ABIs transmit floating point values in distinct argument registers
> from other types (pointers and integers).  If there is a C function
> prototype,
> there is no ambiguity as to which kind of register each argument goes in,
> but in the case of varargs (and unprototyped functions) an ABI may elect
> either to place an outgoing double value into a non-float register, or
> else it may elect to place the value in a floating point register.  In the
> latter case, Java may need a short double[] array to model outgoing
> floating point register values, which is organized separately from the
> previously mentioned long[] array.
>
> (The same thing may happen if the platform has other kinds of registers
> besides general-purpose and floating-point.  It might happen with
> vector registers, for example, if the ABI requires vectors to be passed
> in vector registers.  It will almost certainly require this if the vector
> matches an argument in a function prototype; for varargs the story
> can vary from ABI to ABI.  As a typical example, some ABIs mix
> structs and vectors into the sequence of intptr_t values by converting
> them into sub-sequences of one or more values, derived as if by
> loading the composite value from memory as successive machine words.
> In a case like this, Java can use a similar trick as with the double[]
> array; it can model the outgoing values stored in vector registers
> using a long[] array or (once we get value types) an array of a
> suitably sized value type.  In the case of a long[] array the size
> depends on the product of the number of vector registers times
> the size of a single vector in 64-bit units.  We don't need to cover
> these cases now, but we may want to cover them later.)
>
> Unlike the ABIs, the C standards may leave argument passing behavior
> undefined for types other than floats, ints, and pointers (and pointer
> decay products).  This afftects C structs and platform-specific vector
> types.  For those types the compiler writer may consult the ABIs for
> direction, and ask what (if anything) the ABI prescribes for passing
> those types when a function prototype is missing.
>
> In the end, the C compiler will usually treat varargs-controlled arguments
> as if they were passed to an unprototyped function, and the ABI will
> usually assign the first few to registers and the rest to a contiguous
> area at or near the top of the stack.  For varargs, we can call this
> contiguous area the "argument dump area", since it amounts to
> a spot where the va_arg macro can start picking up the arguments.
>
> Thus, when the C compiler creates a call to a variable arity function
> (or an unprototyped one) it marshals a bunch of intptr_t values into
> a few registers and the top of the stack.  Then it performs a function
> call, transferring control to the entry point of the function.
>
> To access the varargs value, the callee first executes a va_start
> macro.  This is a C compiler intrinsic which sets up one or more
> pointers (in a va_list struct).  The main pointer is to the start of
> the argument dump area, as defined by the ABI.  (Often that is
> near the place on the stack where the return address is stored.)
> The effect of the va_start macros also marshals any stray values
> in argument registers into memory, so that the va_arg macro
> can "see" them.  If there are several kinds of registers, there
> may be extra register dump areas to hold the register values
> for easy access.  It is up to the callee to finish the job of the
> caller in setting up all dump areas as needed by the va_arg
> macro.
>
> (Some C compilers, especially early ones, use push instructions
> to push the arguments in right-to-left order, and then the call
> instruction pushes the return address.  This leads to the state
> of affairs just described as typical.  On RISC machines, arguments
> are stored directly to memory by dead reckoning.  The result
> is the same.)
>
> The placement of the argument dump is where ABIs often get clever.
> A typical ABI may require extra uninitialized space in the main dump
> area, so that the callee can spill register values into the uninitialized
> space.  After doing this, the callee can conveniently find all the
> arguments in one contiguous area of memory.  In the simplest
> case, all of the arguments might end up stored in memory in
> a single intptr_t array.  Some ABIs ensure that this is true for
> all types of arguments.  The reward of such regularity is the
> simplicity of the va_arg macro and the va_list type:
>
>    typedef intptr_t* va_list;
>    #define va_arg(ap, T) (*(T*)_post_inc(va_list, _va_sizeof(T))
>    #define _post_inc(v, n) (((v) += (n)) - (n))
>    #define _va_sizeof(T) ( sizeof(union { intptr_t w; T t; }) /
> sizeof(intptr_t) )
>
> The effect of this macrology is that va_arg(ap, T) turns into *(T*)ap++
> when T is no larger than a single machine word (which is typical).
> In this case, scanning over an argument list is just a thinly veiled walk
> down an array of intptr_t values, stored in argument order.
>
> In more complicated cases, where the ABI mandates extra dump
> areas, the va_arg macro may expand to control flow, which pops
> the argument out of the appropriate place.  Here's pseudo-code
> for a hypothetical ABI that uses a separate dump area for a
> limited number of :
>
>     T va_arg(ap, T) {
>       if (T == double && ap.fptr - ap.fbase < MAX_FP_ARG_REGS)
>         return *ap.fptr++;  // return value from float dump area
>       T* p = (T*) ap.sptr;
>       ap.sptr += _va_sizeof(T);
>       return *p;
>     }
>
> For Java to perform a call to a varargs C function, it should
> marshal the machine words (as longs) for the various arguments.
> If there are extra argument register types, it should marshal those
> also, and keep track how many there are of each kind.  The
> assembly code created by the binder needs to ensure that
> each value is transmitted in the location (register or memory)
> which is mandated by the ABI.  This procedure is about the
> same as for executing a regular non-varargs call from Java.
>
> One difference is that each different combination of arguments
> needs different code.  If the JIT cannot choose this code statically,
> an interpreter will need to dynamically marshal the arguments
> into some staging area, and then run assembly code that will
> copy the contents of the staging area onto the top of the
> stack and/or memory as needed.
>
> Another smaller difference between varargs and non-varargs
> calls is (as noted above) the fact that the ABI requires 32-bit
> values to be widened to the machine word size.  This may fall
> out by accident as Java models native values as longs.
>
> In general, each Java carrier value must be examined to see if it can
> be transmitted using one of the standard types of default argument
> conversions.  If it can be converted to a pointer, the conversion
> should be done.  (This covers the String type, if that is convertible
> to char*.)  If it is a floating point value, it should be converted to
> double and stored bitwise into the appropriate array element (either
> long or double) which models the argument.  If it is an integral type
> (non-floating primitive) it needs to be packed into a long.  If it is
> a struct or vector type, it may need to be packed into several longs
> (or several doubles, if the ABI demands packing into floating-point
> registers).
>
> Java values carry their own type information as needed.  Passing a mix
> of pointers and integers and floats through a varargs Java API can be
> modeled as a builder API which appends values of various C-level types
> to a growing marshalling area.  It can also be modeled as a scan over
> an array of Object references, asking each value what its C-level type
> should be, when viewed as an input for the default argument
> conversions.  The latter approach can (and should) be built on top of
> the former.
>
> When Java originates a varargs call, it is a good idea to quietly add
> a one or two extra values to the argument list even though the callee
> function won't (or shouldn't) see them.  The performance cost of this
> is small.  Adding a NULL simplifies interoperation with certain
> varargs APIs that use a NULL value as a sentinel for the end of the
> argument array.
>
> For debugging support, storing a second extra argument with an unusual
> value (such as 0xCAFEBABE?) will help diagnose low-level bugs.  The
> NULL will also help of course.  Usually argument dump locations are
> tightly constrained by the ABI, but it is always safe to add an extra
> argument at the end.
>
> For callbacks from C to Java, varargs is easier to handle, although it
> is not trivial.  The native code which sets up the callback needs to
> perform the same actions as the va_start macro, and then pass the
> resulting va_list up to Java.  Since the size of va_list is
> ABI-dependent, and some ABIs may object to taking a bitwise copy of
> it, it is much safer to pass a pointer to it to Java.
>
> To use this pointer, Java needs standard native functions which can
> extract the various typed arguments, given a pointer to a va_list and
> the desired type.  For full portability there needs to be one such
> function for each fundamentally different type.  But the binder can
> consult the ABI and use more generic accesses; in practical cases the
> va_arg macro can be wrapped in a function like this:
>
>     long va_arg_function(va_list* ap, int kind, void* buffer) {
>       switch (kind) {
>       case K_DOUBLE:  // might be special fp reg dump
>         return double_to_long_bits_raw(va_arg(*ap, double));
>       case K_PTR:
>        //return (long) (intptr_t) va_arg(*ap, void*);  // not needed
> usually
>       case K_INT32:
>        //return va_arg(*ap, int32_t);  // not needed usually
>       case K_INT64:
>         return (long) va_arg(*ap, intptr_t);
>       case K_VEC128:
>         *(vec128_t*)buffer = va_arg(*ap, vec128_t);
>         return (long) (intptr_t) buffer;
>         // and so on for each vector type in the ABI
>       }
>     }
>
> Rather than having a va_arg function for every possible type (which is
> probably impossible), this one function takes an enumeration token
> that distinguishes among the basic types that the ABI recognizes.  If
> the ABI is simple enough, all cases can be handled by just one kind of
> access, K_INT64 (for untyped intptr_t).
>
> Note that this approach allows each set of arguments to be walked only
> once, since the original copy of the va_list is mutated by every call
> to va_arg_function.  This is easy to work around given a second
> function which wraps the va_copy macro, to make a snapshot of the
> original va_list.
>
> The above account shows how ABIs inform the passing of arguments,
> including varargs arguments, but it doesn't define the specific rules for
> any particular ABI.  A write-up for any particular ABI we support,
> specifying
> the rules exactly, would probably take as much space as this note which
> purports to summarize all of them.
>
> There are a number of ways to get this information.  The most direct
> is to compile test cases on your favorite platform and observe their
> machine code and/or behavior.  Yes, that's more direct than the official
> method, which is tracking down an ABI document and reading it to
> find the necessary rules (both steps are usually difficult).  Reading
> the code of a compiler backend is sometimes useful.  Even better,
> reading the code of the excellent libffi library, which exists only to
> perform calling sequences on many ABIs, would probably teach
> us all we need to know quickly, for any given ABI.
>
> Here's a final idea.  It would be good to create a small test suite
> that infers the argument passing rules by compiling and running
> C code.  Such suites are trivial to write if the only goal is to detect
> sizes and alignments of ABI-defined types.  Probably with a bit
> of work we could write C code that would discover more subtle
> things, such as how many argument registers there are and which
> types go with which kinds.  Of course, if the ABI allows the va_list
> to be a single pointer to a seamless array of intptr_t value, then
> there won't be much anything to discover, but code which stores
> long and loads doubles through va_lists will be able to tell when
> if the array isn't seamless.
>
> I'd appreciate any other pointers to ABI information relevant to
> arguments and especially va_list processing.  I know some people
> on this list have looked into this in depth.
>
> — John