analysis of popular FFI frameworks and their layout descriptions
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 25 15:35:55 UTC 2018
Hi,
over the last few days I've conducted some analysis and explorations of
the existing frameworks which allow some kind of native interop in
different programming languages. This analysis is by no means
exhaustive, but I hope to have covered the most popular frameworks out
there (and I apologize in advance for any unwanted omissions!). The goal
was to find out whether there were other frameworks out there using some
kind of layout description in the same way as we do in Panama (see
example in [0]), and as to whether the description used in such
frameworks has characteristics similar to the one we are discussing.
I classified the frameworks in the following three categories:
1) pack/unpack
These frameworks are very 'informal' and they simply allow to view a set
of values as a packed data structure. This is typically done by
converting one or more values into a string containing the byte values
of the resulting representation, where the representation is given in
some kind of layout template. We can think of these as moral successors
of C's printf format strings ;-)
* Python struct [1]
* Ruby pack/unpack [2]
* Lua struct [3]
* Pack200 [4]
I'd say that both Python struct and Lua struct are quite close to the
type descriptors currently implemented in Panama. The types closely
match the set of types available in C and there are also ways to denote
padding, endianness, etc. Ruby flavor is slightly different - since it's
more string-oriented certain types are missing (most notably floating
point types).
Pack200 is slightly different in that its layout language is used to
describe the layout of a VM attribute, so the layout language is
(obviously) biased by VM-centric notions. But that's an another
interesting example of how a little language can be used to 'teach' a
framework to understand (and in this case, pack/unpack) a blob of bytes.
2) FFI with layout description
In this category we find frameworks which allow interoperability with
native functions - as such, they have ways to bind to native functions,
but they also have ways to model native data in a way that is friendly
to the host language (which is crucial if one has to pass aggregates to
a native function).
* Python ctypes [5]
* Ruby FFI [6]
* Python uctypes [7]
Python ctypes and Ruby FFI are very similar; they model native data
structures by allowing the user to create an instance of a class whose
members are dynamically generated given a layout (beauty of dynamic
languages :-)). The layout description however is not a string, as in
the above category, but, rather, an object that can be modeled naturally
given the constructs available in the host language. For instance,
Python's ctypes uses a dictionary to model the layout. In any case, the
essence of both layout description is to come up with a list of
fieldName/fieldType pairs that can be used to generate the contents of
the class struct dynamically. The fieldType part of the tuple is
typically something that closely resembles some C type.
Python uctypes (part of the MicroPython effort), closely follows the
tracks of the above frameworks, but with a notable distinction: its type
descriptions are more abstract; uctypes can speak only about int
(signed/unsigned), floating points and addresses. As a result, the
layout description looks less language-specific. Interestingly, this is
only true for the data modeling part - the FFI part of uctypes adopts a
different layout description which is string-based, but is also
C-specific (e.g. we have things such as 'i', 'l', 'f', 'c', ...).
3) FFI without layout description
In this category we have, as before, frameworks which allow host
languages to call into native functions. The main difference is that the
frameworks in this category do not rely on any kind of layout
description (either Object-based or string-based) and instead chose to
specify layouts by means of augmenting class definitions.
* JNA [8]
* JNR [9]
* #Net Pinvoke [10]
* Rust FFI [11]
JNA and JNR are actually quite close in terms of how native data
structures are expressed in Java. The user has to define a subclass of
some Struct abstract class, and the types of the fields of the class
determine the layout. Of course there are issues with that approach, in
that the order in which fields need to be outputted is not discoverable
reliably at runtime, which is why JNA comes up with a way to specifiy
the field order (a struct must implement the getFieldOrder() method).
JNR allows to fix the impedance mismatch between Java types and C types
by allowing Java types of structs to be annotated. So it's possible for
a JNR declaration to say e.g. '@size_t int'.
Sidenote: both JNR and JNA (and also Python ctypes and Ruby FFI, from
the previous category) use libffi [14] (or, in the case of JNR, a nice
little JNI wrapper called jffi [15]) to perform invocations of actual
native functions, a nice little library which allows to call a library
function whose signature is not statically known.
Pinvoke resorts to a similar approach, where to define a native struct
the user has to define e.g. a C# struct, and then augmenting that
declaration with some annotations which specify things such as field
offset. More specifically, the annotation
"[StructLayout(LayoutKind.SequentialLayout)]" can be used to tell #Net
that the struct has to be laid out in the same order in which its fields
where declared. An 'ExplicitLayout' attribute is also available, in
which case the user has to specify offsets manually for each field.
Interestingly, #Net introduces the notion of 'blittable' types [16], as
types whose value do not require conversion when going from managed to
unmanaged mode and vice-versa; not surprisingly, only user-defined
structs with sequential/explicit layout and explicit offsets are treated
as 'blittable'.
In RustFFI, no particular effort is required to model a struct - that's
because Rust structs already closely follow C structs; with a bunch of
annotations the programmer can specify that the layout of the struct
should indeed be C-compliant. And that's it. So, in a way, the approach
is not too dissimilar to what's done in the previous frameworks, but the
fact that the language is already C-friendly, helps in reducing the
semantics mismatch to a minimum. Rust also defines a package/crate
(called 'libc') which contains several C type definitions, so that they
can be used to be more explicit.
4) Other FFI
Lastly, we cover non-standard kinds of FFI support which take a somewhat
different and more ad-hoc approach than the ones discussed above.
* Go [12]
* JavaCPP [13]
Go approach is radically different, since in Go native code can be
embedded as a comment in the go source itself and then referred to by
the Go code in that compilation unit. In other words, what Go does is
not too dissimilar from what Panama's jextract does - only this is
implemented as a sort of static preprocessing step on a regular golang
source file.
The main goal of JavaCPP is to provide bridging between Java and C++; as
such, JavaCPP is very class-oriented, allowing programmers to model C++
classes as Java classes. Such classes might be sprinkled with some extra
annotations (e.g. to say that a java String should be translated as a
C++ std String). Once the source has been compiled and a class is
obtained, that class has to be passed to the JavaCPP runtime for an
intermediate build step, which generates (and compile) the JNI code
required to perform the bridging with the native library (by using the
metadata included in the interface declaration - this step is similar in
spirit to what jextract does).
Then, once all ingredients have been generated, the class can be finally
be executed, since all the JNI plumbing will be there. Therefore,
JavaCPP allows programmers to call into native methods w/o worrying
about writing JNI code themselves; for those who don't want to type a
class declaration manually, there's also a tool similar to jextract
which parses an header file and generates the source code for the
annotated declaration, which can then be compiled and built as before.
In conclusion, layout languages seem popular in 'informal'
marshalling/unmarshalling frameworks (a la Python structs), but
relatively unpopular in FFI classic frameworks. In dynamic programming
languages, such frameworks allow creation of ad-hoc classes modeling
native structures starting from a 'layout' description, but such a
description is never a string, and it's always tied to the constructs
expressible in the host language (tuples and such).
Also, there's no framework (I could find) that adopts a layout
description that is totally language agnostic and semantics-neutral.
Most of the type descriptions occurring in the analyzed FFI frameworks
in category (2) adopt a description that is deliberately C-specific. The
most general is w/o doubts, Python's 3rd party uctypes module, whose
types simply reify the signed-ness and FP-ness nature of the values
being described, but not much else; but interestingly, layouts there are
only used for representing structures while different mechanism is used
for invoking native functions (that is, native function signatures are
described using yet another layout descriptions which closely follows C
types).
Maurizio
[0] -
http://hg.openjdk.java.net/panama/dev/file/65ff6482c4d5/test/jdk/java/nicl/System/UnixSystem.java#l68
[1] - https://docs.python.org/2/library/struct.html
[2] - http://ruby-doc.org/core-2.5.0/Array.html#method-i-pack
[3] - https://github.com/iryont/lua-struct
[4] -
https://docs.oracle.com/javase/8/docs/technotes/guides/pack200/pack-spec.html#tocAtLaDe
[5] - https://docs.python.org/2/library/ctypes.html
[6] - https://github.com/ffi/ffi
[7] - http://docs.micropython.org/en/latest/wipy/library/uctypes.html
[8] - https://github.com/java-native-access/jna
[9] - https://github.com/jnr/jnr-ffi
[10] - https://msdn.microsoft.com/en-us/library/ef4c3t39.aspx
[11] - https://doc.rust-lang.org/book/first-edition/ffi.html
[12] - https://golang.org/cmd/cgo/
[13] - https://github.com/bytedeco/javacpp
[14] - http://sourceware.org/libffi/
[15] - https://github.com/jnr/jffi
[16] -
https://docs.microsoft.com/en-us/dotnet/framework/interop/blittable-and-non-blittable-types
More information about the panama-dev
mailing list