notes on binding C++
John Rose
john.r.rose at oracle.com
Mon Oct 31 02:27:24 UTC 2016
On Oct 28, 2016, at 10:49 PM, John Rose <john.r.rose at oracle.com> wrote:
>
> Mikael and I have had a few good conversations about binding C++ to Java interfaces.
>
> The following notes are FTR, TBD, NYI, and every other TLA which implies "tentative".
Here are a few more thoughts about C++ binding in Java.
These notes are also captured FTR in this file:
http://cr.openjdk.java.net/~jrose/panama/cppapi.cpp.txt
API point linkage stubs, as generated by jextract.
Any type has a number of _API points_ that may be applied to values
of that type. For example, C++ classes may supply API points for
field access, method call, implicit conversions, etc. Making a
subclass is a complex API point. Fortran arrays may be read,
written, sliced, and aliased with other arrays.
Some API points are defined in terms of an OS-specific ABI, which
means that on any given system there is a specific series of
machine instructions that operate the API point. For ANSI C, all
API points, except macros, are defined by an ABI. For C++, ABI
support may be partial and/or unstable.
ABI-defined API points are data access (structs and arrays) and
function calls (both named and via a function pointer). On some
systems the ABI may also specify the mechanics of name mangling,
virtual function calls, and subclass layout.
A C++ inline function consists of code that is replicated into
client uses of that function. Unix-like ABIs do not directly
represent the action of an inline function, and so API features
built from function inlining are not supported by thoses ABIs.
An ABI-defined API point can be operated by a metadata-driven
mechanism, such as libffi, or the JVM's native call generator.
Other API points a real compiler to directly emit code, at compile
time, to operate a particular API point on a particular variable.
If an ABI could include enough AST or IR capabilities to represent
a function body, that function could be exported to applications
without direct inlining at compile time. The inlining would take
place during linking or JIT compilation. This in fact is what the
JVM does, since its ABI can encode most methods using bytecodes.
This more powerful representation allows more optimizations to
occur after link time.
On Unix-like systems, nearly all API points can be supported at
least indirectly by the system ABI. One simple way to do this is
by wrapping the essential action of each API point (for each type)
into a a _machine code stub_ which contains the code that the
compiler would generate to operate that API point. The stub itself
must be callable using the ABI; typically it is a function with
arguments drawn from a limited set of types (pointers and other
scalars). If the type being operated on is complex, the stub
requires the caller to put the type's value in memory first, and
then pass a pointer to the stub. In this way, a wide variety of
non-ABI-capable operations can be expressed using little snippets
of binary code wrapped in ABI-capable entry points. These little
snippets are called out-of-line, and so may cost performance and
prevent some optimizations. But they are convenient and often good
enough.
The jextract tool scans a header file (or other API specification)
and finds API points to make available to a Java programmer. It
emits metadata in Java native form, which is to say it emits a
bundle (JAR) of class-files. The classes are purely abstract
interfaces describing the shape of the APIs, not their contents.
Annotations are used to bind ABI parameters to particular names.
For example, a struct field might be annotated with its type, name,
and offset, and a function might be annotated with its type, name,
and linker symbol. Elements that can be easily computed from the
Java types and names need not be repeated in annotations.
When the Java application runs, it loads the extracted metadata and
runs a _binder_ on it, which gives implementations to all the
interfaces, implementations which are consistent with the ABI
requirements. For example, a struct field might be accessed with
a call to a "get" or "put" operation from the "Unsafe" facility,
computing the address using the offset associated with the field.
An inline function cannot (in the general case) be represented
fully using metadata, so the jextract tool must also emit a machine
code stub which wraps the function (as if it were out-of-line).
The jextract tool must also leave enough "clues" in the metadata to
enable the binder to associate each API point with the correct
stub. These stubs should be emitted in two forms: First, as C++
code, for purposes of debugging and porting. Second, as a DLL to
be loaded into the JVM with the associated library.
Here are some examples of C++ classes and associated suites of
machine code stubs.
http://cr.openjdk.java.net/~jrose/panama/cppapi.cpp.txt
More information about the panama-dev
mailing list