FTR: Interface-centric persistent class declaration

Wed Aug 2 22:31:40 UTC 2017

http://cr.openjdk.java.net/~jrose/panama/using-interfaces.md

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!--*-markdown-*-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Using Interfaces to Represent Rich APIs</title>
<style type="text/css">
  body          { font-family: Times New Roman; font-size: 12px;
                  line-height: 125%; width: 36em; margin: 2em; }
  code, pre     { font-family: Courier New; background: #eee }
  blockquote    { font-size: 10px; line-height: 130%; }
  pre           { font-size: 10px; line-height: 120%; }
  h1            { font-size: 14px; }
  h3            { font: inherit; font-weight:bold; }
  pre           { padding: 1ex; background: #eee; width: 40em; }
  h3            { margin: 1.5em 0 0; }
  ol, ul, pre   { margin: 1em 1ex; }
  ul ul, ol ol  { margin: 0; }
  blockquote    { margin: 1em 4ex; }
  p             { margin: .5em 0 .5em 0; }
  h4            { margin: 0; }
  a             { text-decoration: none; }
</style>
</head>
<body>

<!-- This document is in Markdown format:
     http://daringfireball.net/projects/markdown/
     $ pandoc --smart [ --standalone | --self-contained ] $f.md -o [ $f.html | $f.pdf ]
     H/T practicaltypography.com
 -->

# Using Interfaces to Represent Rich APIs
  > John Rose, 2017-0519 _(0.1)_

Classes are a powerful way to organize APIs, which is why you find
them in languages like Java, C++, and C#.  You also find them
simulating non-class APIs, which not really a surprise as classes
(from which Java interfaces are derived) were originally designed
for simulation.

Meanwhile, Java interfaces are classes but with less power:

  - No implementation (except fully generic default methods)
  - No fields (except static ones)
  - Only public members (with small exceptions)
  - No constructors (although static factory methods are trending)
  - No finality (free overrides and subtyping)

Yet interfaces have an extra power beyond classes:  They can hide
their representations completely.  If you define a non-public class
that implements one or more interfaces, clients can call any of the
interfaces, and they can reflectively query the shape of your class,
but they cannot do anything to the class except through the interfaces.
A related power is that there can be any number of these implementation
classes, and the user can't tell the difference except by squinting at
the class reflections, or unless the classes implement side-channel APIs.

This power can be amplified by a design pattern we use in Panama
called the "binder", which is an automatic tool that supplies implementation
to a marked-up interface, in much the same way as the JVM supplies
implementation to a class defined in a class-file.

Except for some sharp edges, a binder can model a fair amount of the
capabilities of any Java (or C++) class, with relatively little
annotation overhead and boilerplate overhead, and complete
representational abstraction, including the ability to handle multiple
representations simultaneously.

## Employee: A simple data class

Here's a class that has "stuff" in it that doesn't fit into an interface:

~~~~
final public class Employee {
  public Employee(long id, String name)
    { assert(id != 0); this.id = id; this.name = name; }
  public long id;
  public String name;
  public static String[] reportFieldNames()
    { return new String[] { "id", "name" }; }
  public Object[] reportFields()
    { return new Object[] { id, name }; };
  public static int getAPIPriority() { return 99; }
  public String toString() { return name+"("+id+")"; }
  public boolean equals(Object that)
    { return that instanceof Employee && equals((Employee)that); }
  public boolean equals(Employee that)
    { return this.name.equals(that.name) && this.id == that.id; }
  public int hashCode() { return Objects.hash(name, id); }
}
~~~~

But here is an interface that can emulate that class, with a low notational overhead:

~~~~
interface Employee {
  public interface Statics {
    @Constructor Employee make(long id, String name);
    @Constructor private static
    void init(Employee self, long id, String name)
      { assert(id != 0); self.id(id); self.name(name); }
    default String[] reportFieldNames()
      { return new String[] { "id", "name" }; } // random logic
    default int getAPIPriority() { return 99; }
  }
  @Static Statics statics();  // hook for recovering statics
  // fields (don't need @Getter/@Setter distinction, probably)
  @Field long id();
  @Field void id(long id);
  @Field String name();
  @Field void name(String name);
  // random logic methods are just default methods
  default Object[] reportFields()
    { return new Object[]{ id(), name() }; }
  // object method overrides require a different name
  @ForObject default String toString_()
    { return name()+"("+id()+")"; }
  @ForObject default boolean equals_(Employee that)
    { return this.name().equals(that.name())
                        && this.id() == that.id(); }
  @ForObject default int hashCode_()
    { return Objects.hash(name(), id()); }
}
~~~~

The interface is split into two parts, to represent the static "slice"
of the class and the non-static "slice".  The constructor is split in
two to factor apart its external interface (as a factory) and the
internal code, to which access must be controlled (happily, as a
static private method).  Constructors are placed in the static slice
because, when viewed as API points, they act like static factories.

Even with the extra noise from the annotations and from converting
fields into getter and setter methods, the interface version of the
type has only about 25% more characters than the "native" version.
That's an irritating notational overhead, but not a really bad one.

Here is some sample code using this interface:

~~~~
final Employee.Statics EMP = MyBinder.bind(Employee.Static.class);
Employee e = EMP.make(42, "Ix");  // cf. e = new Employee(42, "Ix")
e.name("Zaphod");  // cf. e.name = "Zaphod"
int prio = EMP.getAPIPriority();  // cf. Employee.getAPIPriority()
int prio2 = e.statics().getAPIPriority(); // cf. e.getAPIPriority()
Employee e2 = e.statics().make(007, "Bond");
    // => "make another one just like e"
~~~~

## One interface per slice

A big issue with this setup is that one interface cannot easily
represent both the static and non-static "slices" of a class (either
Java or C++).  Note that constructors are really more like static
methods, from the outside (though they look non-static from the
inside).

There's a degree of freedom which always comes up here, of how to
"stack" the static and non-static stuff.

1. Non-static first:

~~~~
interface Employee { @Field long id(); â€¦
  @Static Statics statics();
  @Static interface Statics
    { @Constructor Employee make(long id, String name); â€¦ } }
final Employee.Statics es = MyBinder.bind(Employee.Static.class);
Employee e = es.make(42, "Ix");
e.name("Zaphod");
~~~~

Then also:

~~~~
es.staticMethod();  // same as...
e.statics().staticMethod(); // â€¦ this call
e.statics().make(007, "Bond");  // "make another one like e"
~~~~

2. Static first:

~~~~
interface Employee {
  @NonStatic interface Instance { @Field long id(); â€¦ }
  @Constructor Instance make(long id, String name); â€¦ }
final Employee es = MyBinder.bind(Employee.class);
Employee.Instance e = es.make(42, "Ix");
e.name("Zaphod");
~~~~

3. All in one:

~~~~
interface Employee { @Field long id(); â€¦
  @Constructor Employee make(long id, String name); â€¦
  @NullTest boolean isNull(); }
final Employee e0 = MyBinder.bind(Employee.class);
Employee e = e0.make(42, "Ix");
e.name("Zaphod");
Employee b = e.make(007, "Bond");
    // => statics are uniformly mixed with instance methods
assert(!e.isNull() && e0.isNull());
    // => but there is a dynamic difference
e0.name("Robbie");  // => throws an exception; because
    // this is the null value of Employee good only for statics
~~~~

4. None at top level:

~~~~
interface Employee {
 @NonStatic interface Instance { @Field long id(); â€¦
   @Static Statics statics(); }
 @Static interface Statics {
   @Constructor Instance make(long id, String name); â€¦
   @NonStatic Instance instance(); } }
final Employee.Statics es = MyBinder.bind(Employee.class);
Employee.Instance e = es.make(42, "Ix");
e.name("Zaphod");
~~~~

> (For Panama there is a special temptation to do #3 because it gives
typed null pointers "for free".  But we mostly gravitate towards #1.
I think #2 is the most principled, but use cases read funny.  I threw
in #4 for the sake of brainstorming.)

## What's that binder?

The binder is conceptually simple: It just returns an implementation
of the indicated interface.  But it is much more complex than it
looks.  For starters, it probably has to spin a class file, although
simple binders can use the `java.lang.reflect.Proxy` API.  Often a
binder will have its own configuration parameters.  You can have
different binders for different classes of storage, such as
persistent, off-heap, on-heap, etc., and for different levels of
invariant checking, such as unsafe-but-fast or safe-and-slow.  A
binder can mix in additional interfaces under the covers, such as ones
for doing shallow and deep copies and/or freezing (to make
immutables).

In Panama a binder can extract statically stored information (from
annotations) about the library that implements an API, and be sure to
load that library dynamically, then bind its entry points to the
methods of the type it is implementing.

## Access control vs. public methods

The biggest problem with these transforms is that all interface
methods are fully public, so you can't directly declare non-public
members in the interface you want to model.

In some cases, making the interface itself non-public will help, but
that only works for data structures which are completely internal.
Types at the edge of a trust boundary, which interact with untrusted
parties, cannot use this simple trick.

For dealing with private fields and methods, there are a few choices,
none of them great.  First, wait for a future version of Java to add
non-public members to interfaces.  (That's a can of worms.)  Second,
add a Lookup parameter to all non-public methods at the same time as
raising them to public status.  Have the binder insert appropriate
authentication logic into the the methods.  Third, support temporary
private (and package-private) *views* of your classes, separate
interfaces (nested like Statics) that contain a public method or field
accessor for each private method or field.  Then, use a Lookup object
(as before) to "unlock" access to the private view of a public object.
Be careful not to pass around the view objects, since that's
equivalent to opening up the internals of your object, by delegation
via the view, to the delegation recipient.  As a variation of the
third tactic, the interfaces could be made non-public.  (All this
requires the binder to break access control at times, to hook up the
private bits.)

~~~~
public class Employee {
 public long id;
 private String password;
}
~~~~

This class has a private field, which must somehow be protected from
users of the transformed interface, even though all the methods are
public.  Here is an API that accomplishes this:

~~~~
public interface Employee {
 @Static interface Statics
    { @Constructor Employee make(long id, String password); â€¦ }
 @Static Statics statics();
 @Field long id;
 // and to view the password there's a separate view:
 private @Private interface Privates {
   @Field String password();
 }
 // deriving the view goes like this:
 @Private Privates privates(Lookup token);
}
~~~~

The user must supply a Lookup object, which will be validated:

~~~~
Employee.Statics es = MyBinder.bind(Employee.Statics.class);
Employee e = es.make(1, "shazam");
String s = es.privates(lookup()).password();
~~~~

A possible problem related to accessibility is the fact that any
public interface can be implemented by anybody, including untrusted
code.  This means that you have fewer guarantees than you think when
you are holding an Employee in your hand; it may have been implemented
by an attacker.  This can be partially addressed by a future feature
we call "sealed interfaces", but there is an uneasy truce between open
polymorphism and opening the door to hostile spoofing.  We see it all
the time with the collection API.

## Modeling type hierarchies

Another challenge, ultimately of the same sort, is mapping a whole
class hierarchy into an equivalent hierarchy of interfaces.  Clearly,
if two original classes are in a subclass/superclass relation, then
the derived interfaces should also have a similar relation.  Luckily,
interfaces can implement multiple supertypes, so there is no conflict
between (say) some common supertype mandated by the binder and a
supertype brought in by the transform.

But the easy, no-holds-barred supertyping supplied by interfaces comes
with a surprising cost: Sometimes you need to keep names *separate*,
but interface subtyping mandates that every distinct method descriptor
(name plus erased argument and parameter types) can have only *one*
method implementation, for any particular instance of that interface.
Contrast this with classes, which can refer to both identically
described fields and methods that are simultaneously present in both
the subclass and the superclass.

~~~~
static class Super {
  String m() { return "Super::m"; }
  String f = "Super::f";
}
static class Sub extends Super {
  String m() { return "Sub::m"; }
  String f = "Sub::f";
  String superm() { return super.m(); }
  String superf() { return super.f; }
}
~~~~

Identically described methods are accessed under the "super" keyword,
which enforces some access control rules, but must still enable the
subclass to communicate freely with superclass, even if they were
separately compiled.  In the world of binders and interfaces, this
means a subclass might possibly be created by a different binder that
created the superclass, in which case the privileged communication
between them must be both access-controlled and mediated via an
interface.  The options for access control are limited, but workable,
as described above.

(It's also reasonable for a binder to refuse to subclass the product
of another binder.  In fact, cross-binder instantiation requires extra
API points for transformed constructors, not described here.  A
special case of this would be creating a Java class which subclasses
a C++ class, for the purpose of overriding methods in a callback
or event processing design pattern.)

When transforming a subclass/superclass relation into interfaces, the
statically named parts (including the reference to things like
"super.m()" and "super.f" above) must be partitioned away from the
parts subject to virtual dispatch and override (like "this.m()").
Roughly speaking, in the interface version of the a class hierarchy,
the API surface associated with the "super" keyword must be segregated
in its own interface slice, much like "Statics" or "Privates" above.
The segregation serves two purposes: First, it may enable access
control tactics (like private views).  Second, and more importantly,
it prevents subclass overrides from changing the meaning of the
statically named API points (like "Super::f").

The transformation of the above classes could just add the statically
linked non-virtual entry points into the same "slice" as the the other
statics.  It would look something like this:

~~~~
interface Super {
  public interface Statics {  // includes non-virtual API points
    @Constructor Super make();
    @NonVirtual String m(Lookup token, Super self);
    @NonVirtual @Field String f(Super self);
  }
  @Static <T> T statics(Class<T> statics);
  String m();
  // fields should accessed via non-virtual API, in a non-final class
  //@Field String f();
  // algorithms for easily referring to non-virtuals:
  default String Super_f()
    { return statics(Statics.class).f(this); }
  default String Super_m(Lookup token)
    { return statics(Statics.class).m(token, this); }
}

interface Sub extends Super {
  public interface Statics {  // includes non-virtual API points
    @Constructor Super make();
    @NonVirtual String m(Lookup token, Sub self);
    @NonVirtual @Field String f(Sub self);
  }
  @Override @Static <T> T statics(Class<T> statics);  // supply another view
  @Override String m();
  @Override @Field String f();
  // non-virtual entry points may be omitted for a final class.
  //default String Sub_f()
  //  { return statics(Statics.class).f(this); }
  //default String Sub_m(Lookup token)
  //  { return statics(Statics.class).m(token, this); }
  default Statics statics() { return statics(Statics.class); }
}
~~~~

An oddity here is that, because Java fields are never virtual,
it is not really valid to translate a field getter or setter into an
interface method, if that interface method might be overridden
accidentally by a transformed subclass.  Here's an example
of what goes wrong:

~~~~
Super sup = MyBinder.bind(Super.Statics.class).make();
println(sup.f());  // OK, prints whatever was in Super::f
Sub sub = MyBinder.bind(Sub.Statics.class).make();
println(sub.f());  // still OK, prints contents of Sub::f not Super::f
Super dude = sub;  // uh-ohâ€¦
println(dude.f());  // pervasive overrides => must print Sub::f
~~~~

(Note that inheritance also requires that the user of a static API point
specify *which level* of the class hierarchy is being statically used.
That is why the "statics()" view transform requires an argument.  The
argument can be avoided in the case of a final class.)

## Modeling foreign type systems (C and C++)

The C type system is very different from Java's but interfaces
can emulate it, as discussed here:

<http://cr.openjdk.java.net/~jrose/panama/metadata.html>

Just as the distinction between fields, methods, and constructors can
be made via annotations, similar (and more complex) distinctions
between C API elements can be annotated onto an interface extracted
from a header file.  Additional annotations can help the binder do
correct code generation, by supplying source-level type and layout
information, as well as configuration parameters, notably the name of
the shared library that goes with the API.

The C++ class system is even more complex than Java's, but again
interfaces can cope with it.

At the implementation level, a C++ class-based API can be mostly
transformed into a plain C API that (with some overhead) emulates
the C++ API.  A sketch of that transformation is here:

<http://cr.openjdk.java.net/~jrose/panama/cppapi.cpp.txt>

Further transforming that API into a Java interface is straightforward,
except (of course, as above) for the delicate choices that make the
Java interface look more like the original C++ API, or (on the other
extreme) more like assembly code with mangled names.

Since ease of use *does* depend on clean notations (or at least
notation not septic enough to induce toxic shock), an imported
C++ API should be transformed into as few interfaces as possible,
with names and types rendered as exactly as possible in their
non-mangled forms.

Furthermore, C++ APIs often have deep class hierarchies, and/or
contain many non-virtual members.  Sometimes non-virtual members
intentionally shadow each other, so that naive translation to
interfaces, with pervasive overriding, would spoil the shadowing
semantics.  But these problems can be solved as with Java class
hierarchies.  The essential idea is to keep only truly virtual members
in the "main line" of the Java interfaces that model the API.  This
"main line" also models the C++ subtype/supertype relations in the
API, but does not attempt to present non-overridden features (like
fields or non-virtual methods) except through "side line" interfaces
like the various "Statics" types.

The main-line types can also include methods which model "final",
non-virtual, non-overridden API points, but they must be mangled, so
that there is no accidental override.  The binder must process these
non-virtual API points with special care, never accidentally
overriding them even in the case of an accidental name clash.  In
essence, these non-virtual API points must be agreed to be "final"
even if interfaces do not support finality of methods.  (Perhaps they
should, but that would be a different, very long discussion.)

A third category of interface would be a second side line called
"NonVirtuals" for non-virtual views of specific instances.  These
interfaces would contain non-mangled field names and non-mangled names
of non-virtual methods (such as Java finals and C++ non-virtuals).
Just as "statics(T)" generates a view of the static features of
a type, "nonVirtuals(T)" would generate a view of the non-virtual
features, in which those features would not need to be mangled.

These two side-lines are distinct, because for each binder action,
there is only one "Statics" value, but there must be an associated
"NonVirtual" value for each individual instance of the main-line type.

The relative nesting of these three (or more) interfaces determines
the names that the user will see.  For a C++ class rich in virtual
functions and/or with a full class hierarchy, the main line type
should be the type the user interacts with the most.  But for a
C++ class with little or no inheritance or virtuals, all the interesting
names are non-virtual, and so the main-line type should contain
those names, in an unmangled form.  This main-line type is then
effectively final.

Thus, an imported C++ API, rendered as a bundle of Java interfaces,
will consist of a mix of types for the various slices of the API,
including (at least) the "main-line" which models virtuality and type
hierarchy, and two side-lines which do *not* model either of those
features, one for class-specific values and operations, and one for
instance-specific values and operations.  For types which are final,
the second side-line can be merged into the main line.

## Closing the loop: Modeling Java classes

Perhaps these transforms provide insight into creating better mocking
frameworks.  If there were a robust mapping from the public API of a
class `C` to an associated interface `C.I`, then it would be easy to
make arbitrary behavioral mockups of that class, by spinning a
suitable implementation of `C.I`.  If this mapping were truly robust,
then bytecodes which use the original class `C`, as a client through
the public API, could be rewritten uniformly and automatically to use
`C.I` instead.  The system could be validated using a standard
implementation of `C.I` which just binds directly to `C`, unchanged.
And then it could be perturbed and stress-tested by using other
implementations of `C.I`, perhaps enabling fault injection or tracing.

A particularly slick version of such a facility would be one which
would not require bytecode rewriting, but simply ask the JVM to accept
`C.I` values wherever the bytecodes require the `C` type.  In effect,
every bytecode operation which is currently sensitive to class types
would be doused with extra virtual sauce, and made to work equally
on the correspondingly named interface points.  Hyper-virtualizing
all classes in this way would bring the JVM to its knees, but it
would be reasonable to hyper-virtualize a selected unit, such as
a package or module.

Or perhaps a single class could opt into hyper-virtualization.  In
that case the class, as coded, would no longer be the unique
implementation of its API, but would be the principal implementation,
subject to re-binding as needed.  This would provide a way to define
Java classes which interoperate with their transformed cousins in
other semantic domains, such as persistent memory or off-heap data.

For now, it is best to experiment on the interfaces, without waiting
for automatic hyper-virtualization.  Just as the Panama header file
scraper automatically derives API interfaces from C APIs, we could
have a class scraper which derives interfaces Java classes.  That
might be a reasonable tool for working with Java types which need
to be ported into strange places.