analysis of popular FFI frameworks and their layout descriptions

Thu Jan 25 16:07:00 UTC 2018

On Thu, 25 Jan 2018 15:35:55 +0000, Maurizio Cimadamore wrote:
> Hi,
> over the last few days I've conducted some analysis and explorations
> of the existing frameworks which allow some kind of native interop in 
> different programming languages. This analysis is by no means 
> exhaustive, but I hope to have covered the most popular frameworks
> out there (and I apologize in advance for any unwanted omissions!).
> The goal was to find out whether there were other frameworks out
> there using some kind of layout description in the same way as we do
> in Panama (see example in [0]), and as to whether the description
> used in such frameworks has characteristics similar to the one we are
> discussing.

Forgive me for butting in -- just a lurker on this list and not a Panama
contributor at present.

This is a really useful summary, so thank you. (I have been wanting to
write my own survey of FFIs for a while now. I'll be sure to
acknowledge you if I do write something up later, since these notes are
sure to help me out!)

Since you ask about language-agnostic and semantics-neutral models... 
have you considered debugging information?  That means DWARF on Unix
platforms, PDB on Windows.

I don't know of any (*) FFI systems at present which work by reading
debugging information. However, it is "obvious" (;-) to me that this is a
better way than using a compiler on header files. Unless you duplicate
the build environment precisely -- including the right version of each
header file, the same values of any preprocessor flags used for
configuration, and, in some cases, the right compiler version --
there's no guarantee you're getting the same binary interface as the
code you're linking to. Whereas debugging information necessarily
documents that interface in detail.

(Or maybe you've solved these problems another way, since I'm not
up-to-speed on Panama in detail. Forgive me if so!)

(*) ... actually not quite true. I have done a very sketchy/partial
proof-of-concept of a JavaScript (V8) FFI using my liballocs runtime,
which takes debug info as input. There was a talk/demo at Strange Loop
2014 <https://www.youtube.com/watch?v=LwicN2u6Dro>
and code on GitHub <https://github.com/stephenrkell/liballocs>
and also a research paper at Onward! 2015.

I'd love to take this further and/or do the equivalent in the Java
world. So, just putting it out there for anyone interested... please
feel free to get in touch. If this might fit into the Panama project
somehow, so much the better. I don't have tons of resources to spend on
this *right* now, but it could be made to happen in the near-ish future.

Otherwise do forgive the interruption... cheers,

Stephen.

> I classified the frameworks in the following three categories:
> 
> 
> 1) pack/unpack
> 
> These frameworks are very 'informal' and they simply allow to view a
> set of values as a packed data structure. This is typically done by 
> converting one or more values into a string containing the byte
> values of the resulting representation, where the representation is
> given in some kind of layout template. We can think of these as moral
> successors of C's printf format strings ;-)
> 
> * Python struct [1]
> * Ruby pack/unpack [2]
> * Lua struct [3]
> * Pack200 [4]
> 
> I'd say that both Python struct and Lua struct are quite close to the 
> type descriptors currently implemented in Panama. The types closely 
> match the set of types available in C and there are also ways to
> denote padding, endianness, etc. Ruby flavor is slightly different -
> since it's more string-oriented certain types are missing (most
> notably floating point types).
> 
> Pack200 is slightly different in that its layout language is used to 
> describe the layout of a VM attribute, so the layout language is 
> (obviously) biased by VM-centric notions. But that's an another 
> interesting example of how a little language can be used to 'teach' a 
> framework to understand (and in this case, pack/unpack) a blob of
> bytes.
> 
> 2) FFI with layout description
> 
> In this category we find frameworks which allow interoperability with 
> native functions - as such, they have ways to bind to native
> functions, but they also have ways to model native data in a way that
> is friendly to the host language (which is crucial if one has to pass
> aggregates to a native function).
> 
> * Python ctypes [5]
> * Ruby FFI [6]
> * Python uctypes [7]
> 
> Python ctypes and Ruby FFI are very similar; they model native data 
> structures by allowing the user to create an instance of a class
> whose members are dynamically generated given a layout (beauty of
> dynamic languages :-)). The layout description however is not a
> string, as in the above category, but, rather, an object that can be
> modeled naturally given the constructs available in the host
> language. For instance, Python's ctypes uses a dictionary to model
> the layout. In any case, the essence of both layout description is to
> come up with a list of fieldName/fieldType pairs that can be used to
> generate the contents of the class struct dynamically. The fieldType
> part of the tuple is typically something that closely resembles some
> C type.
> 
> Python uctypes (part of the MicroPython effort), closely follows the 
> tracks of the above frameworks, but with a notable distinction: its
> type descriptions are more abstract; uctypes can speak only about int 
> (signed/unsigned), floating points and addresses. As a result, the 
> layout description looks less language-specific. Interestingly, this
> is only true for the data modeling part - the FFI part of uctypes
> adopts a different layout description which is string-based, but is
> also C-specific (e.g. we have things such as 'i', 'l', 'f', 'c', ...).
> 
> 3) FFI without layout description
> 
> In this category we have, as before, frameworks which allow host 
> languages to call into native functions. The main difference is that
> the frameworks in this category do not rely on any kind of layout 
> description (either Object-based or string-based) and instead chose
> to specify layouts by means of augmenting class definitions.
> 
> * JNA [8]
> * JNR [9]
> * #Net Pinvoke [10]
> * Rust FFI [11]
> 
> JNA and JNR are actually quite close in terms of how native data 
> structures are expressed in Java. The user has to define a subclass
> of some Struct abstract class, and the types of the fields of the
> class determine the layout. Of course there are issues with that
> approach, in that the order in which fields need to be outputted is
> not discoverable reliably at runtime, which is why JNA comes up with
> a way to specifiy the field order (a struct must implement the
> getFieldOrder() method).
> 
> JNR allows to fix the impedance mismatch between Java types and C
> types by allowing Java types of structs to be annotated. So it's
> possible for a JNR declaration to say e.g. '@size_t int'.
> 
> Sidenote: both JNR and JNA (and also Python ctypes and Ruby FFI, from 
> the previous category) use libffi [14] (or, in the case of JNR, a
> nice little JNI wrapper called jffi [15]) to perform invocations of
> actual native functions, a nice little library which allows to call a
> library function whose signature is not statically known.
> 
> Pinvoke resorts to a similar approach, where to define a native
> struct the user has to define e.g. a C# struct, and then augmenting
> that declaration with some annotations which specify things such as
> field offset. More specifically, the annotation 
> "[StructLayout(LayoutKind.SequentialLayout)]" can be used to tell
> #Net that the struct has to be laid out in the same order in which
> its fields where declared. An 'ExplicitLayout' attribute is also
> available, in which case the user has to specify offsets manually for
> each field. Interestingly, #Net introduces the notion of 'blittable'
> types [16], as types whose value do not require conversion when going
> from managed to unmanaged mode and vice-versa; not surprisingly, only
> user-defined structs with sequential/explicit layout and explicit
> offsets are treated as 'blittable'.
> 
> In RustFFI, no particular effort is required to model a struct -
> that's because Rust structs already closely follow C structs; with a
> bunch of annotations the programmer can specify that the layout of
> the struct should indeed be C-compliant. And that's it. So, in a way,
> the approach is not too dissimilar to what's done in the previous
> frameworks, but the fact that the language is already C-friendly,
> helps in reducing the semantics mismatch to a minimum. Rust also
> defines a package/crate (called 'libc') which contains several C type
> definitions, so that they can be used to be more explicit.
> 
> 4) Other FFI
> 
> Lastly, we cover non-standard kinds of FFI support which take a
> somewhat different and more ad-hoc approach than the ones discussed
> above.
> 
> * Go [12]
> * JavaCPP [13]
> 
> Go approach is radically different, since in Go native code can be 
> embedded as a comment in the go source itself and then referred to by 
> the Go code in that compilation unit. In other words, what Go does is 
> not too dissimilar from what Panama's jextract does - only this is 
> implemented as a sort of static preprocessing step on a regular
> golang source file.
> 
> The main goal of JavaCPP is to provide bridging between Java and C++;
> as such, JavaCPP is very class-oriented, allowing programmers to
> model C++ classes as Java classes. Such classes might be sprinkled
> with some extra annotations (e.g. to say that a java String should be
> translated as a C++ std String). Once the source has been compiled
> and a class is obtained, that class has to be passed to the JavaCPP
> runtime for an intermediate build step, which generates (and compile)
> the JNI code required to perform the bridging with the native library
> (by using the metadata included in the interface declaration - this
> step is similar in spirit to what jextract does).
> 
> Then, once all ingredients have been generated, the class can be
> finally be executed, since all the JNI plumbing will be there.
> Therefore, JavaCPP allows programmers to call into native methods w/o
> worrying about writing JNI code themselves; for those who don't want
> to type a class declaration manually, there's also a tool similar to
> jextract which parses an header file and generates the source code
> for the annotated declaration, which can then be compiled and built
> as before.
> 
> 
> 
> 
> 
> In conclusion, layout languages seem popular in 'informal' 
> marshalling/unmarshalling frameworks (a la Python structs), but 
> relatively unpopular in FFI classic frameworks. In dynamic
> programming languages, such frameworks allow creation of ad-hoc
> classes modeling native structures starting from a 'layout'
> description, but such a description is never a string, and it's
> always tied to the constructs expressible in the host language
> (tuples and such).
> 
> Also, there's no framework (I could find) that adopts a layout 
> description that is totally language agnostic and semantics-neutral. 
> Most of the type descriptions occurring in the analyzed FFI
> frameworks in category (2) adopt a description that is deliberately
> C-specific. The most general is w/o doubts, Python's 3rd party
> uctypes module, whose types simply reify the signed-ness and FP-ness
> nature of the values being described, but not much else; but
> interestingly, layouts there are only used for representing
> structures while different mechanism is used for invoking native
> functions (that is, native function signatures are described using
> yet another layout descriptions which closely follows C types).
> 
> Maurizio
> 
> [0] - 
> http://hg.openjdk.java.net/panama/dev/file/65ff6482c4d5/test/jdk/java/nicl/System/UnixSystem.java#l68
> [1] - https://docs.python.org/2/library/struct.html
> [2] - http://ruby-doc.org/core-2.5.0/Array.html#method-i-pack
> [3] - https://github.com/iryont/lua-struct
> [4] - 
> https://docs.oracle.com/javase/8/docs/technotes/guides/pack200/pack-spec.html#tocAtLaDe
> [5] - https://docs.python.org/2/library/ctypes.html
> [6] - https://github.com/ffi/ffi
> [7] - http://docs.micropython.org/en/latest/wipy/library/uctypes.html
> [8] - https://github.com/java-native-access/jna
> [9] - https://github.com/jnr/jnr-ffi
> [10] - https://msdn.microsoft.com/en-us/library/ef4c3t39.aspx
> [11] - https://doc.rust-lang.org/book/first-edition/ffi.html
> [12] - https://golang.org/cmd/cgo/
> [13] - https://github.com/bytedeco/javacpp
> [14] - http://sourceware.org/libffi/
> [15] - https://github.com/jnr/jffi
> [16] - 
> https://docs.microsoft.com/en-us/dotnet/framework/interop/blittable-and-non-blittable-types
>