Bridge methods in the VM

Tue Jan 22 13:51:35 UTC 2019

We’ve been thinking for a long time about the possibilities of pushing bridging down into the VM.  The reasons we have had until now have not been strong enough, but generic specialization, and compatible migration of libraries, give us reason to take another swing.  HTML inline (list willing); MD attached. 

VM Bridging
Historically, bridges have been generated by the static compiler. Bridges are generated today when there is a covariant override (a String-returning method overrides an Object-returning method), or when there is a generic instantiation (class Foo implements List<String>). (Historically, we also generated access bridges when accessing private fields of classes in the same nest, but nestmates did away with those!)

Intuitively, a bridge method is generated when a single method implementation wants to respond to two distinct descriptors. At the language level, these two methods really are the same method (the compiler enforces that subclasses cannot override bridges), but at the VM level, they are two completely unrelated methods. This asymmetry is the source of the problems with bridges. One of the main values of making the JVM more aware of bridges is that we no longer need to throw away the useful information that two seemingly different methods are related in this way.

We took a running leap at this problem back in Java 8, when we were doing default methods; this document constitutes a second run at this problem.

Bridge anomalies

Compiler-generated bridge methods are brittle; separate compilation can easily generate situations where bridges are missing or inconsistent, which in turn can result in AME, invoking a superclass method when an override exists in a subclass, or everyone's favorite anomaly, the bridge loop. Start with:

class Parent implements Cloneable {
   protected Object clone() { return (Parent)null; }
}

class Child extends Parent {
   protected Parent clone() { return (Parent)super.clone(); }
}
Then, change Parent as follows, and recompile only that:

class Parent implements Cloneable {
   protected Parent clone() { return (Parent)null; }
}
If you call clone() on Child you get a StackOverflowError (try it!) What's going on is that when we make this change, the place in the hierarchy where the bridge is introduced changes, but we don't recompile the entire hierarchy. As a result, we have a vestigial bridge, and when we invoke clone() with invokevirtual from the new bridge, we hit the old bridge, and loop.

The fundamental problem here is that we are rendering bridges into concrete code "too early", based on a compile-time view of the type hierarchy. We want to make bridge dispatch more dynamic; we can accomplish this by making bridges more declarative than imperative, by recording the notion "A is a bridge for B" in the classfile -- and using that in dispatch -- without having to decide ahead of time exactly what bytecodes to use for bridging.

Generic specialization

Generics gave us a few situations where we want to be able to access a class member through more than one signature; specialized generics will give us more. For example, in a specialized class:

class Foo<T> {
    T t;

    T get();
}
In the instantiation Foo<int>, the type of the field t, and the return type of get(), are int. In the wildcard type Foo<?>, the type of both of these is Object. But because a Foo<int> is a Foo<?>, we want that a Foo<int>responds to invocations of get()Object, and to accesses of the field t as if it were of type Object.

We could handle the method with yet more bridge methods, but bridge methods don't do anything to help us with the field access. (In the M2 prototype we lifted field access on wildcards to method invocations, which was a useful prototyping move, but this does nothing to help existing erased binaries.)

So while bridge methods as a mechanism run out of gas here, the concept of bridging -- recording that one member is merely an adaptation for another -- is still applicable.

Summary of problems

We can divide the problems with bridges into two groups -- old and new. The old problems are not immediately urgent to fix (brittleness, separate compilation anomalies), but are a persistent source of technical debt, bug tails, and constraints on translation evolution.

The new problems are that specialized generics give us more places we want bridges, making the old problems worse, as well as some places where we want the effects of bridging, but for which traditional bridge methods won't do the trick -- adaptation of fields.

Looking ahead, there are also some other issues on the horizon that we will surely encounter as we migrate the libraries to use specialized generics -- that have related characteristics.

Proposed solution: forwarded members
I'll lay out one way to express bridges for both fields and methods in the classfile, but there are others. In this model, for a member B that is a bridge for member M, we include a declaration for B in the class file, but we attach a Forwarding attribute to it, identifying the underlying member M (by descriptor, since its name will be the same) and indicating that attempts to link to B should be forwarded to M:

Forwarding {
    u2 name;
    u4 length;
    u2 forwardeeType;
}
A method with a Forwarding attribute has no Code attribute. We would then replace existing bridges with forwarding members, and for specializable classes, we would generate a forwarding member for every method and field whose signature contains type variables (and which therefore would change under erasure), whose descriptor is the erasure of the forwardee descriptor.

Adaptation

In all the cases so far, the descriptor of the bridge and of the forwardee differ in relatively narrow ways -- the bridge descriptor can be adapted to the forwardee descriptor with a subset of the adaptations performed by MethodHandle::asType. This is adequate for generic and specialization bridges, but as we'll see below, we may want to extend this set.

Conflicts

If a class contains a bridge whose forwardee descriptor matches the bridge descriptor exactly, the bridge is simply discarded. This decision can be made only looking at the forwarding member, since we'll immediately see that the member descriptor and the forwarding descriptor are identical. (Such situations can arise when a class is specialized with the erasure of its type variables.)

Semantics

The linkage semantics of forwarding members are different from that of ordinary members. When linking a field or method access, if the resolved target is a forwarding member, we want to make some adjustments at the invocation site.

For a getfield that links to a forwarding member, we link the access such that it reads the forwardee field, and then adapts the resulting value to the bridge field type, and leaves the adapted value on the stack. (This is as if the getfield is linked to the result of taking a field getter method handle, and adapting it with asType() to the bridge type.) For a putfield, we do the reverse; we adapt the new field value to the forwardee type, and write to that field.

If the forwarding member is a method, we re-resolve the method using the forwardee signature, adapt its parameters as we would for putfield and its return value as we would for getfield, and invoke the forwardee with the invocation mode present at the call site. Again, the semantics here are as if we took a method handle for the forwardee method, using the invocation mode present at the call site, and adapted it with asType to the bridge descriptor.

The natural interpretation here is that rather than materializing a real field or method body in the class, we manage the forwarding as part of the linkage process, and include any necessary adaptations at the access site. The bridge "body" is never actually invoked; we use the Forwarding metadata to adapt and re-link the access sites.

Bridge loops

The linkage strategy outlined above -- where we truly treat bridges as forwarding to another member -- is the key to breaking the bridge loops. Specifying forwarded members means that the JVM can be aware that two methods are, at some level, the same method; the more complex linkage procedure allows us to invoke the bridgee with the correct invocation mode all the time, even under separate compilation.

In our Parent/Child example, Child::clone will do an invokespecial to invoke Parent::clone()Object, which after recompilation is a bridge to Parent::clone()Parent. We'll see that this is a bridge, and will forward to Parent::clone()Parent, with an invokespecial, and we'll land in the right place.

The elimination of bridge loops here stems from having raised the level of abstraction in which we render the classfile; we record that Parent::clone()Object is merely a bridge for Parent::clone()Parent, and so any invocation of the former is redirected -- with the same invocation mode -- to the latter. It is as if the client knew to invoke the right method.

User-controlled bridges

The compiler will generate bridges where the language requires it, but we also have the opportunity to enable users to ask for bridges by providing a bridging annotation on the declaration:

@GenerateBridge(returnType=Object.class)
public static String foo() { ... }
This will instruct the compiler to generate an Object-returning method that is a bridge for foo(). This could be done for either fields or methods. (People have written frameworks to do this; see for example http://kohsuke.org/2010/08/07/potd-bridge-method-injector/).

Near-future problem: type migration
This mechanism may also be able to help us deal with the case when we want to migrate signatures in an otherwise-incompatible manner, such as changing a method that returns int to return long, or an OptionalInt to Optional<int>, or a old-style Date to the newer LocalDate. Numerous library modernizations (such as migrating from the old date-time libraries to the JSR-310 versions) are blocked on the ability to make such migrations; specializing the core libraries (especially Stream) will also generate such migrations.

Such migrations are a generalization of the sort of bridges we've been discussing here; they involve adding an additional two features:

Additional adaptations, including user-defined adaptations (such as between Date and LocalDate)
Interaction with overriding, so that subclasses that override the old signature can still work properly.
Projection-embedding pairs

Given two types T and U, a projection-embedding pair is a pair of functions p : T → U and e : U → T such that ∀u ∈ U p(e(u)) = u, and, if t is in the range of p, then e(p(t)) = t. Examples of useful projection-embedding pairs are the value sets of LV and QV for any value class V (we can embed the entirety of QV in LV, but LV contains one value -- null -- that can't be mapped back), any types Tand U where T <: U, int and long (we can embed int in long), and Date and LocalDate. Intuitively, a p-e pair means we can freely map back and forth for the embeddable subset, and we get some sort of failure (e.g., NPE, or range truncation) otherwise.

User-provided adaptations

Many of the adaptations we want to do are handled by MethodHandle::asType: casting, widening, boxing. But sometimes, a migration involves types that require user-provided adaptation behavior, such as converting Date to LocalDate. (Bridges need to do these in both directions; we use the embedding for reads and projection for writes.) Here, we can extend the format of the Forwarding attribute to capture this additional behavior as pairs of method handles, such as:

Forwarding {
    u2 name;
    u4 length;
    u2 forwardeeType;
    // adaptation metadata
    u1 pePairs;
    { u2 projection; u2 embedding; }[pePairs];
}
When linking an access site for a forwarding member, when an adaptation is not supported by MethodHandle::asType, we use the user-provided embedding function for adapting return types and field reads, and the projection function for adapting parameter types and field writes.

Overriding

A more complicated problem is when we want to migrate the signature of an instance member in a non-final class, because the class may have existing subclasses that override the member, and may not yet have been recompiled. For example, we might start with:

interface Collection {
    int size();
}

class ArrayList implements Collection {
    int size() { return elements.length; }
}
Now, we recompile Collection but not ArrayList:

interface Collection {
    @TypeMigration(returnType=int.class)
    long size();
}
When we go to load ArrayList, we'll find that it overrides the bridge (size()int), and does not override the real method. We'll want to adjust ArrayList as we load it to make up for this.

Half the problem of this migration is addressed by having a forwarding method from size()int to size()long; any legacy clients that call the old signature will be bridged to the new one. To further indicate that overrides of such a method should be adjusted, suppose we mark this forwarding bridge with ACC_MIGRATED (in reality, we can probably use ACC_FINAL for this). Now, when we go to load ArrayList, we'll see that size()int is trying to override a migrated method (this is much like the existing override-a-final check). Instead of rejecting the subtype, instead we use ACC_MIGRATED bridges as a signal to fix up overrides.

We already have all the information in the Forwarding attribute that we need to fix ArrayList::size; we rewrite the descriptor to the forwardee descriptor, use the projection function for adapting argument types, and the embedding function for adapting the return type, and install the result in ArrayList. It is as if we adapted the subclass method with asType to the forwardee descriptor, and installed that in the subclass instead.

The effect is that in the presence of a migrated bridge, the bridge descriptor is a toxic waste zone; callers are redirected to the new descriptor by bridging, and overriders are redirected to the new descriptor by adaptation.