Importing native APIs to Java

John Rose john.r.rose at oracle.com
Thu Jan 1 04:19:57 UTC 2015


[Resend; one more try with flat, correct text.  Please ignore previous 2 messages.  Happy New Year!]

I've posted some thoughts about importing C and C++ native APIs to Java, including some detailed ideas about what the jextract tool should emit.

To some extent, the paper bears on the vexed question of whether to emphasize references or values in APIs.  The default position is "both, explicitly".  But it calls out where different choices could be made.

The URL to the draft (which will be updated in place over time) is here:
  http://cr.openjdk.java.net/~jrose/panama/metadata.html
  http://cr.openjdk.java.net/~jrose/panama/metadata.md

For the record, I have also pasted the markdown, below.

— John

# Java-centric foreign metadata
#### John Rose, 12/2014, version 0.1

The Java ecosystem has a rich and flexible interchange format for
metadata: the class-file.  By adapting this format to represent
imported (foreign) APIs, we can make imported APIs more fully
interoperable with Java tools and VMs.

## Principles

  0. _Extraction_ The features of C/C++ APIs, as specified in the contents of
     C/C++ header files, are imported to Java by mechanically extracting metadata.
  1. _Interfaces_ An imported C/C++ API is represented using a nest of extracted Java interfaces,
     plus additional metadata as needed.
     Classes, fields, and non-abstract methods are avoided as much as possible.
     a. The whole import unit (one or more header files) is mapped to an extracted interface.
     b. Top-level definitions within the file are mapped to interface methods and nested types.
     c. Imported features are extracted as interface methods.
     d. Type nesting is used to model the nesting of imported features.  (This is especially important for C++.)
  2. _Automatic_ Translation or configuration parameters, if manually determined, are minimized.
     a. Extracted interfaces are suitable for programming at "C level" with the imported API.
     b. C preprocessor options (`-I`, `-D`) must be specified manually.
     c. The Java package of the extracted interfaces is specified manually,
        and does not directly model any imported feature.
        (Note that Java packages are useful for modular grouping and isolation of extracted interfaces.)
     d. There is no need to decide the direction (in, out, or in/out) of pointer parameters
        or whether an imported feature is supposed to be used by value or by reference.
  3. _Adaptation_ Extracted interfaces can also be composed with adapters.
     a. This happens after importation or (perhaps) as a plug-in logic to the import tool.
     b. Adapters contribute behavior that improves safety, convenience,
        or integration with the Java platform.
     c. Adapters may have modified method signatures.  (Example: `char*` often maps to `String`.)
     d. Adapters may reimplement underlying extracted interfaces.  (Example:  Bits may be copied between C and Java heaps.)
  4. _Complete_  Every operation and type in the header file API is rendered (using one or more interface methods).
     a. Imported operations include: call a function, make a variable (scalar, struct, or array),
        read or write a variable.
     b. Some macros may be importable, including manifest constants and pseudo-function.
     c. Manual configuration options (to be used sparingly) may allow filtering to
        specifically include or exclude of imported features.
  5. _Accurate_ The possible uses of an imported API, from Java, are as close as possible to uses from C.
     a. Extracted interfaces will embody distinctions native to C.
        These include pointer vs. addressable l-value vs. plain value.
     b. C/C++ names are rendered in Java as accurately as possible.
        Simple mangling rules apply, when needed.  Alphabetic case never changes.
        Name components not imported from C are always distinct from imported name components.

### About abstractness

There are several important corollaries to the fact that we use only interfaces and their methods.

  * No computation is possible on an extracted interface until it is instantiated
    with respect to a particular runtime library or simulation of a library.
  * Extracted interfaces can be implemented directly for maximum performance.
  * For improved safety, the same interfaces can be wrapped with safety checks,
    or proxied to an external process or sandbox containing the native code.
  * The same interfaces can also be wrapped with data profilers or debuggers,
    or mocked within test frameworks.

To maintain abstractness, we will avoid or make only sparing use of
certain features of Java types.  Fields will _not_ be used to model
constant values.  Java interface subtyping will _not_ be used to model
relations of imported features.  Static methods (a feature new in Java
8) will _not_ be used to export fixed behaviors.

Classes or enums will _not_ be used in any directly extracted API.  In
some cases, such types may appear in manual adaptations of directly
extracted interfaces.

Even with these exclusions, there remaining language is rich,
consisting of non-static interface methods, nested interfaces,
nested annotation definitions, and annotation uses on types,
methods, and method parameters.

Default methods (i.e, non-static but concrete interface methods) may
be useful, as a way of extracting behavioral information from the
header file that is not available from bindable code in a DLL.  Such
methods can be overridden if necessary.  To maintain abstractness and
safety, they should only access extracted interfaces, and not make
"secret" calls to the runtime, especially to unsafe routines.  It is
probably safest to express non-trivial behavioral information via
annotations, rather than default methods.

One good use of default methods may be to derive getter and setter
behaviors from addressable l-values.  The address-of operator for a
structure field would be abstract, but the get and set methods could
be defaulted in the obvious way.

> <span class="smaller"> _Side note:_ **Translating manifest constants**

> In C or C++, a manifest constant is a value, usually integral, which
is known at compile-time, to both implementations and clients of an
API.  Manifest constants can be defined as macros or enumeration
values or (in C++) as initialized `const` values.

> It may sometimes seem desirable (as a compromise with abstractness) to
render most manifest constants as `static final` interface fields.
This would allow manifest constants to participate in constant Java
expressions, including case labels.

> But the basic approach of using interface methods (not fields) can be
valid even for manifest constants.  For example, a manifest constant
might have the same name but different values on various platforms; in
this case, an extracted Java API may succeed in being portable if it
avoids directly revealing the manifest constant value.

> In Java, a wrapper interface can convert a field constant to an
abstract method or vice-versa, without loss of function.  It therefore
appears that special treatment for imported manifest constants can be
delayed to the generation of post-import wrappers.  In the examples
below, the uniform method-based extraction is shown for constants.

</span>

### About safety

As a corollary of the principal of accuracy, any type safety problems
will be imported to the Java APIs.  Thus, extracted Java APIs will (in
general) be unsafe for general use.  Many use cases will require
restrictive wrappers to be created by hand or mechanically.

(Oddly enough, abstractness does not help with safety.  Once an
abstract interface is bound directly to a C API, it obtains all the
power and risk of that API.)

Type safety in C/C++ APIs is usually less than comparable Java APIs.
This arises from what might be called "sharp edges" in the basic
design of C, including use of untyped pointers, casts, unions, and
"varargs" functions.  More subtly, since storage lifetimes are managed
by hand, C functions must trust callers not to pass "dangling"
pointers to out-of-scope memory.

Any features of a C/C++ API which compromise safety must be used with
care by a knowledgeable C/C++ programmer, and the same will be true of
an automatically extracted Java API, since the automated extraction
process will probably miss some of the knowledgeable care required of
the human programmer.

A more complete review of safety issues is provided [in a blog article
on Project Panama][isthmus].

[isthmus]: https://blogs.oracle.com/jrose/entry/the_isthmus_in_the_vm

### Integrating interfaces and metadata

As a corollary of the principle of automation, the output of the
import process must be machine-readable, so that it can be processed
as directly as possible by additional tools and by the JVM itself.
This is a major reason to start with Java interfaces as a structuring
principle, rather than a completely new IDL-like intermediate language.

Java interfaces are not expressive enough to describe (in natural
terms) all distinctions between C/C++ entities.  And the principle of
accuracy forbids us from freely discarding distinctions.  This means
that the import process will create not only Java interfaces, but also
additional _sideband_ metadata.

Sideband metadata can include:

  * original names of imported entities (when different from Java entities)
  * original types of imported entities (when not representable in Java)
  * structure layouts (offsets, sizes, alignments, byte order, bitfields)
  * linkage class specifiers (`extern "C"`, `__stdcall`, etc.)
  * anything else needed to bind the interfaces to native functions or data structures

Sideband metadata is not necessary or desirable when it can be
trivially deduced from interface or method names or types.

Sideband metadata can (in principle) be stored in a number of ways.
It can be carried as annotations or resource files.  It can be
expressed as a string-based "little language", or (if using
annotations) using type-safe references to classes and other
constants.  Metadata can also be placed "coarsely" as a large chunk of
information on an extracted interface or spread "finely" toward the
leaves of the extracted API structure.

The approach used by JNR spreads sideband metadata about method
parameters into parameter annotations.  This approach seems to be
easiest to work with, since it co-locates all relevant metadata
closely as possible to any given API element.  In particular, it will
be easy for IDEs and post-processing tools to query annotations and
discover sidebands.

**ISSUE:** Annotation metadata can be spread as piecemeal annotations on
the individual methods, or else rolled up onto the enclosing header
file.  In the extreme, it can be rolled up into resource files
associated with imported header files.  Which organization gives the
best mix of compactness, ease of access, and expressiveness?
Provisional answer: Spread annotations are the most natural, and will
be used in the examples below.

### Carrier types vs. original types

C values can appear in a number of roles in an imported API, including
function arguments and returns, struct and union fields, and manifest
constants.  As in Java, each C value has a specific C type.

In many cases, a C type in an imported API will not be directly and
unambiguously representable as a Java type in the corresponding
extracted interface.  Even the seemingly simple keywords `int`,
`long`, and `char` have subtly different meanings in the two
languages.

An imported C value will be appear as a parameter or return value for
one or more extracted interface methods.  The value's C type will
determine a corresponding Java type to appear in the extracted
interface.  This Java type is called the _carrier type_ for the
originally imported C type.  Every C type has a carrier type, and
perhaps more than one.

Carrier types are chosen to be _non-lossy_: They can represent all
values of the corresponding C types, either via the usual conversions
of Java, or with a special mapping (such as sign masking or string
expansion).  For example, either Java `byte` or Java `char` can be a
carrier for C `char` (with suitable sign adjustments).  But Java `int`
cannot be a carrier for C `int` on ILP64 systems.

Because the type system of C is richer (in some ways) than Java's
types, a carrier type does not uniquely determine a C type.  For
example, the Java type `long` serves as a carrier for both C `unsigned
int` and C `long long`.

The automatic import process specifies a particular carrier type for
each API element.  The automatically chosen carrier type can
efficiently represent all possible values of the imported C type.
If the carrier type can represent other values, values passed
into the extracted API are truncated down to the imported C type.

Adaptation can modify these carrier types to something safer or more
convenient for Java programmers, such as `char*` to `String`.

The extracted interface also records (in a sideband) the original C
type, if it cannot be derived trivially and uniquely from the carrier
type.  Parameter and method annotations (or perhaps type annotations)
appear to be the simplest way to carry this sideband.

JNR uses a nice pattern for annotating method parameters with original
types.  For each named type (such as `nlink_t` or `int32_t`) there is
a [corresponding annotation][JNR nlink_t] of the same simple name,
which can be applied to extracted methods as needed.

[JNR nlink_t]: http://jnr.github.io/jnr-ffi/apidocs/jnr/ffi/types/nlink_t.html

This pattern can be used to simply and concisely represent imported
types of all sorts in extracted APIs.

All carrier types, without exception, will be [value-based][].  This
means they will always be wrappers for values, not full Java objects
in their own right.

[value-based]: http://download.java.net/jdk8/docs/api/java/lang/doc-files/ValueBased.html

### Carrier names vs. original names

Just as Java must use carrier types as approximate representations for
C types, the import process must sometimes distinguish between the
original name of an imported C API element and the Java name used to
represent it in an extracted interface.

The Java name can be called the _carrier name_ of the originally
imported name.

Carrier names can usually be made identical to the original name, but
if they must be changed, the change must be reversible and as
predictable as possible.

An annotation represents the original name if the carrier name differs.
If the C function’s DLL linkage requires additional information beyond
the name and header file, an annotation records this also.

Example:

    C: int synchronized();
    J: @C.Name("synchronized") int synchronized$();

Both Java and C++ support overloaded function names, but (of course)
with different rules.  Java overloadings for a single name must be
distinct in their (erased) type signatures, a condition which will not
always be true for imported C++ functions, since distinct C++
overloadings may use identical carrier type signatures.

If a carrier name is subject to conflicts between imported elements,
or if it conflicts in some other way with a distinct use of the same
Java name, the extraction process adds a suffix of the form `$` or
`$N`, where `N` is some positive decimal numeral.  The number should
be chosen in a stable manner, but is unspecified.

Example:

    C: void put(long long);
    J: @C.Name("put") void put(@C.long$long long $arg1);
    C: void put(unsigned int);
    J: @C.Name("put") void put$1(@C.unsigned$int $arg1);

## Detailed rules for importing names

It is time to embody the above design principles by proposing detailed
rules for translating imported C APIs to extracted Java interfaces.

### Import header files to top-level interfaces

An import unit is a header file.  (It could also be a group of headers.)
The header file gives its name to a top-level extracted interface.
Additional extracted interfaces are nested inside it as member types.
Elements which occur textually in the header file are rendered in the extracted interfaces.

Example:

    $ jextract sys/stat.h
    C: # line 1 "sys/stat.h" \n …
    J: interface sys_stat { … }

(*Note:* The beginning of each example line indicates whether the line
is imported C code, an extracted Java interface, or a tool invocation.)

Details:

  * The suffix (`.h`, etc.) of the header file name is removed.  `stdio.h` becomes `interface stdio {…}`
  * Slashes are replaced by underscore.  `sys/stat.h` becomes `interface sys_stat {…}`.
  * The package into which the interfaces are defined is a manual translation parameter.
    See the Graal project for an example of platform-specific package naming conventions.
  * There are manual translation options for stripping or preserving directory components
    (such as `sys`, `X11`, etc.) which appear in .
  * Extraction parameters can be included as sideband data, e.g., `@ImportFile("sys/stat.h")`
  * DLL linkage information is attached to the header file interface using an annotation, such as `@NativeLinkage("…")`

Directory prefix stripping might be manually specified like this:

    $ jextract --strip-directory 1 sys/stat.h
    C: # line 1 "sys/stat.h" \n …
    J: interface stat { … }

**ISSUE:** By default, should directory prefixes (like `sys/` above) be
fully stripped, or not stripped?  Provisional answer: strip fully.

> <span class="smaller"> _Side note:_ **Header files are layered**

> The principle of accuracy can be applied to C `#include` statements
and to header files as a whole.  A C header file has dependencies on
other header files, usually as explicit `#include` statements.  These
dependencies can be (and usually should be) accurately modeled as
Java dependencies between independently extracted Java APIs.

> Any Java type mentioned in an extracted interface must be defined somewhere.
If the type definition is extracted in the same import unit, there is
no problem.  If the type is defined elsewhere, the reference in the
extracted interface must refer to a Java type extracted from a different
import unit, or somehow built into the system.

> Usually, the import process can predict the name of the extracted Java
type, even if it is not in the current import unit, because the import
tool can "see" the header file defining the type, even though it it
not being imported.  In some cases, manual configuration parameters
may be required to decide matters such as package names.

> In some cases, a C type will be implicitly defined in no particular
place.  An opaque pointer type like `MyPtr` in `typedef struct
MyNotDef* MyPtr` (where there is no structure definition for
`MyNotDef`) must be supplied "out of thin air".  An import tool
can usually detect when such tricks are being used, and provide
for a default dummy definition of the needed type.

> Class loader tricks may be useful for generating some implicitly
defined types on the fly.  Later Java releases are likely to provide
better (more "official") support for such tricks, as a side-effect of
providing hooks for types like `List<int>`.

### Import top-level functions as interface methods

A C/C++ function defined at the top-level of a header file is
imported as an interface function in header file's extracted
interface.

    C: int isatty(int); …
    J: interface unistd { int isatty(int $arg1); … }

The interface implementation will control the binding to the C library.
It assumes a meta-factory (at least one) which can implement the binding.
This pattern is similar to that used in JNR.

The top-level function rendered as a method on the interface corresponding to the header file.

### Import typedefs as nested annotation definitions

A new type name introduced in a header file is imported as a member of
the enclosing extracted interface.  The member itself is an annotation.

    C: typedef int count_t;
    J: @C.Typedef(int$.class) @interface count_t { }

### Import variables as interface methods returning l-values

In C, variables support up to three operations: get, set, and take
address.  They are represented using a Java carrier type, a
_reference_ interface, which implements those three operations.

A named global or static variable is extracted as a Java method which
returns a reference interface.

Example:

    C: extern int errno;
    J: @C.Name("errno") C.int$ref errno$ref();

To avoid confusion between l-values and r-values, the carrier name of
the element is adjusted by prepending the term `$ref`.

As we will see, this rule works inside of struct definitions also.

Optionally, convenience getters and setters can be supplied, as default methods.

    J: default int errno$get() { return errno().get(); }
    J: default void errno$set(int $arg1) { errno().set($arg1); }

The names of these convenience methods are uniformly formed by adding
suffixes.  This convention makes it easy to access all aspects of the
variable from an IDE completion dialog.

**ISSUE:** Should the suffix `$ref` be elided, to privilege addressing over getting?  Provisional answer:  If in doubt, preserve explicit distinctions.  Manually tweaked adapters can clean things up.

**ISSUE:** Should the suffix `$get` be elided, to privilege getting over addressing?

**ISSUE:** Should the suffix `$set` be elided since getters and setters (or, addressers and setters) can be distinguished via their signatures alone?

### Import struct types as nested interfaces

A struct type is rendered as an interface nested inside the rendering of its defining scope.

In C the scope containing a struct definition is always a header file.
In C++, more complicated nesting relations are easily accommodated using nested Java interfaces.

These rules also apply to C unions and C++ classes.

Each struct field is rendered as a group of three interface methods, for getting, setting,
and forming an lvalue of the desired field.

Example:

    C: struct gauint { int re, im; }
    J: interface gauint { C.int$ref re$ref(); … }

In most cases, the getters and setters can be derived automatically from
the reference functions.  As with global variables, they could be supplied
as default methods, or else (with some loss of usability) omitted.

    J: interface gauint {
      abstract C.int$ref re$ref();
      int re() { return re$ref().get(); }
      void re$set(int x) { re$ref().$set(x); }
    }

The set of field names can be derived from the set of method names on
the rendered interface.  This is not enough to determine a layout,
however, because Java methods are _not_ reliably ordered, as viewed by
reflection.  (And _that_ is _sad_.)  Therefore, we must have
additional metadata that supplies the order of fields, and/or assigns
their various offsets.

    @C.Struct.Fields({"re","im"})
    interface gauint { @C.Name("re") C.int$ref re$ref(); … }

Note that bitfields are not addressable and therefore require special
getters and setters.  They also require more detailed layout
information than normal primitives.

**ISSUE:** Should reference types (like `int$ref`) support access to
bitfields as well as normally addressable integers?  Provisional
answer: Yes, but be prepared to deal with inconvenient interface
polymorphism.

Note that C structure fields are not always value-based.  Array-typed
structure fields are treated (by C) as implicit pointers to their
first elements, and array references and array values are practically
meaningless types in C.

Structure-valued fields are also not (usually) value-like.  A typical
use by a C programmer of a struct-valued field is to immediately
select a subfield of the struct, in place, rather than somehow take
the whole value of the struct and then narrow it to a subfield.  Thus,
the most useful struct field accessor is its `$ref` version, not its
`$get` version.

Thus, it is reasonable to define all kinds of field access in terms of
references (C l-values).

**ISSUE:**  Maybe leave off `$ref` from field referencers.  (Admit that inside a C struct, everything is a reference.   Construct bitfield reference types, in fact, just to model bitfield l-values.)

**ISSUE:**  Or, maybe leave off `$get` from field getters.  (This doesn't "scale" well to struct-valued fields.)

**ISSUE:**  Maybe leave off `$set` from field setters, and just use the signature information as a hint.

**ISSUE:**  Maybe omit field getters and setters altogether, when a reference function exists.

### Import enums to manifest constants

    C: enum Fruit { … }
    J: @C.Enum(int$.class) @interface Fruit {}
    C: enum Fruit { apple, … }
    J: … @Fruit int apple(); …

### Import object-like macros to manifest constants (when possible)

_(TBW)_

### Import function-like macros to manifest constants (when possible)

_(TBW)_

### Import C++ statics to a companion interface

A C/C++ static field, constant, or member of a class `Foo` will be
imported into an extracted interface which is different from the
extracted interface that represents the type `Foo` and its instances.

    C: class Foo { static int get_errno(); }
    J: interface Foo$static { int get_errno(); }

### Import C++ inline definitions using an extra DLL

In some cases, an inline function definition may be unavailable to a DLL-based binder.
Such a function can be emitted to an extra DLL at import time and made available to the binder.

(Some macros will also benefit from this treatment.)

### Import C++ constructor definitions to factory methods

A C++ constructor will be imported as a "_new" operation in the enclosing interface scope.

_(Use of constructors for supertype initialization is TBD, pending discussion of subclassing.)_

### Import C++ operator definitions to named interface methods

The carrier name for a C++ operator will be something regular, like `$operator$plus` for `operator+`.

Specialized operators like `new` and `delete` will be rolled into the wrapper behavior.

### Import sub-classable C++ types as meta-interfaces

_(TBW)_

## Detailed rules for importing type uses

Named entities are only half of the import story.  The type of a value
(named or not) also determines how it is rendered in the extracted
API.

### Typedef preservation

Occurrences of typedefs in C are mapped to their base types during
import, but the original typedef is (if possible) preserved.

    C: pid_t getpid();
    J: @pid_t int getpid();

> <span class="smaller"> *Road not taken:* It is possible that we could nominalize C typedefs,
so that they show up as first class Java types.

> `   C: typedef int pid_t;`\
`   J: interface pid_t { int $baseValue(); }`\
`   C: pid_t getpid();`\
`   J: pid_t getpid();`

> In that case, the nominalized carrier type would be a wrapper for its base type.

### Primitives to primitives

C has a wide variety of integral types, including signed and unsigned
variants of `char`, `short`, `int`, `long`, and beyond.

Wherever an integral type is used by value (not by reference) it is
imported as the smallest Java type which can represent its full
range of values.  This means, for example, that the C type `long` may
import as either Java `int` or `long`, depending on the data model.

The widest integral type in Java is `long`, the signed 64-bit type.
Integral types, such as `unsigned long long` which have values outside
this range are represented by boxed `Number` objects.

    C: unsigned long long giant();
    J: C.uint64_t giant();

The type `C` is a specially bootstrapped Java type (an interface)
which contains nested types (again, Java interfaces).

The names of these nested types are standardized type names as defined
in `stdint.h` and similar places (TBD).

### C references to library interfaces

The Java runtime library supplies interfaces which represent
addressable C variables.

    interface int$ref extends any$ref {
      int get();
      void set(int x);
      int$ptr ptr();
    }

A reference type is always defined along with its base type `T`,
and is spelled `T.ref` if `T` whenever `T` is rendered as a Java interface.
Otherwise it is `T$ref`.

    C: struct stat { … }
    J: interface stat { interface ref extends obj$ref<stat> { … }; … }

**ISSUE:** We should use Project Valhalla mechanisms to define a global
type `ref<T>`, parameterized to all `T` types.  Then there would be no
reason to embed pointer types inside their base type definitions.
Provisional answer: In the short term, do the nesting trick.  It is
possible that this can be reconciled with the long-term answer, by
having `ref<stat>` expand to `stat.ref`, if the latter nested type
is available.

For each reference type there is also a pointer type.

    interface int$ptr extends any$ptr {
      int$ref ref();  // dereference this pointer
      int$ptr plus(long x);
      long minus(int$ptr x);
      <T> T cast(Class<T> t);
    }

These interfaces will have standard implementations which boil
down to unsafe addresses, with a little Java static typing and/or
metadata for protection.

    class RawIntRef implements int$ref {
      private Object base;  // in case int is in Java heap
      private long address; // or offset if in Java heap
      public int get() { return unsafe.getInt(base, address); }
      public void set(int x) { unsafe.putInt(base, address, x); }
      public int$ptr ptr() { return new RawIntPtr(base, address); }
    }

These interfaces may also have more "virtual" implementations.
This is practical and possible because they boil down to get/set
pairs.

(The dangerous pointer generating operation can be tamed in a number of
ways, such as throwing `UnsupportedOperationException` or allowing
arithmetic-free pseudo-pointers to scalars not actually backed by C
variables.)

There are various reasons for distinguishing pointers from references,
such as the distinctions made in the C and C++ languages themselves.

But the biggest reason to make a distinction is the fact that C
pointers, in their basic semantics, are much less safe than references,
because they allow casting and pointer arithmetic (`plus`, `minus`)
without reliability checks.

Likewise it is useful to distinguish pure
values (r-values) from their references (l-values).  It is mostly
possible to force the types to coincide, but the semantics are
disjoint.  In particular, a pure value cannot change due to
race conditions, while the value of a reference can.  The conversion
from l-value to r-value is therefore semantically significant.
Disregarding it leads to programs with race conditions.

So there is a three-way distinction between value, reference, and pointer.
One might say, as a rule of thumb, "When in doubt, make a distinction."

**ISSUE:** What about multi-level pointers like `int***`?  The notation `int$ptr.ptr.ptr` doesn’t scale well.  Provisional answer:  Pick a number between 1 and 2, and always pre-define that many levels of pointer and reference.

In any case, there must be a pointer type factory to cover multi-level pointer types.

There is no need for multi-level reference types, so they can be comfortably
defined along with their value types.

### Argument type conversions

In the C language, function call arguments in the C language is
type-checked, includes a number of type conversions.  A few of these
are directly available to Java programmers, but most must be
explicitly requested.

## Appendix of undigested issues

**ISSUE:** In a class with both static and non-static members, we might split the static interface from the object interface.

**ISSUE:** Can we use Java shapes to represent other languages?  Or always use DSL in a metafile?

**ISSUE:** For unsafe APIs (i.e., most C APIs) we could add `unsafe$` to the front of symbols to emphasize risks.

**ISSUE:** There is no return-type overloading in Java, but there is in the class file format.  This gives some translation options.

**ISSUE:** How deeply to mangle the names?  (Can remove most mangling but then Java access is problematic.)
    A. Java-centrism suggests having names always be Java valid.  Can preserve original names in annotations.
    B. Isomorphism suggests keeping names as-is (modulo JVM restrictions).  Derive Java-usable APIs when needed.

**ISSUE:** What are the reasonable uses (if any) of interface statics?  Let’s avoid them, to get the most control over encapsulation.

**ISSUE:** Represent constant values by no-argument interface methods, or boil them down to plain static constants?

**ISSUE:** Represent macros (when actually representable) by default interface methods or static interface methods?


**ISSUE:** How near of an isomorphism?  Could we derive the text of the header from metadata?

**ISSUE:** Multiple #ifdef combinations at once (multiple platforms)?
Provisional answer: Not automatically.  Can be handled manually, if
we request adaptation to a "LCM" API.

**ISSUE:** How much does all this help is with the next language, such as C#, Lisp, etc.?  Some, hopefully.

### Boneyard of possible import transforms

    J: import java.ni.c.*;

    C: const int EOF = -1;
    J: default long EOF() { return -1; }
    J: static final long EOF = -1; /*manual treatment?*/

    C: typedef __uint16_t nlink_t;
    J: @interface nlink_t { }

    C: #define S_IFREG 0100000  /*C*/
    static final long S_IFREG = (0100000);

    #define S_ISREG(x) (((x) & S_IFMT) == S_IFREG) /*C*/
    default boolean S_ISREG(long x) { return (((x) & S_IFMT) == S_IFREG); }

    int fchmod(int, mode_t); /*C*/
    int fchmod(int $arg1, mode_t $arg2);

    int chmod(const char* path, mode_t mode); /*C*/
    int chmod(prim_char.ptr path, mode_t mode);

    int fstat(int fd, struct stat* buf); /*C*/
    int fstat(int fd, stat.ptr buf);

    struct stat { /*C*/
      dev_t st_dev;
      time_t st_mtime;
      …
    }
    @CStruct interface stat {
      dev_t.ref st_dev();
      time_t.ref st_mtime();
      …
      @pointer interface ptr { … }
    }

## Acknowledgements

David Chase, Henry Jen, Michel Trudeau, and other members of Project Panama have
contributed valuable insights to this design.  Wayne Meissner is the
author of JNR (the Java Native Runtime) and its underlying FFI
implementation.



More information about the panama-dev mailing list