Bridge methods in the VM

Sat Jan 26 13:05:56 UTC 2019

> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Mardi 22 Janvier 2019 14:51:35
> Objet: Bridge methods in the VM

> We’ve been thinking for a long time about the possibilities of pushing bridging
> down into the VM. The reasons we have had until now have not been strong
> enough, but generic specialization, and compatible migration of libraries, give
> us reason to take another swing. HTML inline (list willing); MD attached.

I'm worry that we are missing the big picture here, bridging by the VM is one way to patch the vtable, there is another feature we need which is also equivalent to paching the vtable, the where condition, the method specialization where a generic method is replaced by a specific version depending on the value of the type arguments. 

If we have a general mechanism of vtable patching, the attribute Forwarding may still exist but instead of being directly read by the VM, it will be read by the JDK side that interact with the vtable patching (think like a bootstrap method) instead of being known by the VM and treated in a ad hoc way. 

That's said, having a forwarding attribute can be fun by itself. 

> VM Bridging

> Historically, bridges have been generated by the static compiler. Bridges are
> generated today when there is a covariant override (a String -returning method
> overrides an Object -returning method), or when there is a generic
> instantiation ( class Foo implements List<String> ). (Historically, we also
> generated access bridges when accessing private fields of classes in the same
> nest, but nestmates did away with those!)

> Intuitively, a bridge method is generated when a single method implementation
> wants to respond to two distinct descriptors. At the language level, these two
> methods really are the same method (the compiler enforces that subclasses
> cannot override bridges), but at the VM level, they are two completely
> unrelated methods. This asymmetry is the source of the problems with bridges.
> One of the main values of making the JVM more aware of bridges is that we no
> longer need to throw away the useful information that two seemingly different
> methods are related in this way.

> We took a running leap at this problem back in Java 8, when we were doing
> default methods; this document constitutes a second run at this problem. Bridge
> anomalies

> Compiler-generated bridge methods are brittle; separate compilation can easily
> generate situations where bridges are missing or inconsistent, which in turn
> can result in AME, invoking a superclass method when an override exists in a
> subclass, or everyone's favorite anomaly, the bridge loop. Start with:
> class Parent implements Cloneable {
>    protected Object clone() { return (Parent)null; }
> }

> class Child extends Parent {
>    protected Parent clone() { return (Parent)super.clone(); }
> }

> Then, change Parent as follows, and recompile only that:
> class Parent implements Cloneable {
>    protected Parent clone() { return (Parent)null; }
> }

> If you call clone() on Child you get a StackOverflowError (try it!) What's going
> on is that when we make this change, the place in the hierarchy where the
> bridge is introduced changes, but we don't recompile the entire hierarchy. As a
> result, we have a vestigial bridge, and when we invoke clone() with
> invokevirtual from the new bridge, we hit the old bridge, and loop.

> The fundamental problem here is that we are rendering bridges into concrete code
> "too early", based on a compile-time view of the type hierarchy. We want to
> make bridge dispatch more dynamic; we can accomplish this by making bridges
> more declarative than imperative, by recording the notion "A is a bridge for B"
> in the classfile -- and using that in dispatch -- without having to decide
> ahead of time exactly what bytecodes to use for bridging. Generic
> specialization

> Generics gave us a few situations where we want to be able to access a class
> member through more than one signature; specialized generics will give us more.
> For example, in a specialized class:
> class Foo<T> {
>     T t;

>     T get();
> }

> In the instantiation Foo<int> , the type of the field t , and the return type of
> get() , are int . In the wildcard type Foo<?> , the type of both of these is
> Object . But because a Foo<int> is a Foo<?> , we want that a Foo<int> responds
> to invocations of get()Object , and to accesses of the field t as if it were of
> type Object .

> We could handle the method with yet more bridge methods, but bridge methods
> don't do anything to help us with the field access. (In the M2 prototype we
> lifted field access on wildcards to method invocations, which was a useful
> prototyping move, but this does nothing to help existing erased binaries.)

> So while bridge methods as a mechanism run out of gas here, the concept of
> bridging -- recording that one member is merely an adaptation for another -- is
> still applicable. Summary of problems

> We can divide the problems with bridges into two groups -- old and new. The old
> problems are not immediately urgent to fix (brittleness, separate compilation
> anomalies), but are a persistent source of technical debt, bug tails, and
> constraints on translation evolution.

> The new problems are that specialized generics give us more places we want
> bridges, making the old problems worse, as well as some places where we want
> the effects of bridging, but for which traditional bridge methods won't do the
> trick -- adaptation of fields.

> Looking ahead, there are also some other issues on the horizon that we will
> surely encounter as we migrate the libraries to use specialized generics --
> that have related characteristics. Proposed solution: forwarded members

> I'll lay out one way to express bridges for both fields and methods in the
> classfile, but there are others. In this model, for a member B that is a bridge
> for member M, we include a declaration for B in the class file, but we attach a
> Forwarding attribute to it, identifying the underlying member M (by descriptor,
> since its name will be the same) and indicating that attempts to link to B
> should be forwarded to M:
> Forwarding {
>     u2 name;
>     u4 length;
>     u2 forwardeeType;
> }

> A method with a Forwarding attribute has no Code attribute. We would then
> replace existing bridges with forwarding members, and for specializable
> classes, we would generate a forwarding member for every method and field whose
> signature contains type variables (and which therefore would change under
> erasure), whose descriptor is the erasure of the forwardee descriptor.

Note that if instead of a forwardeeType being a descriptor, we use a constant method handle, we have the semantics of the lambda metafactory when there is no captured values. 

> Adaptation

> In all the cases so far, the descriptor of the bridge and of the forwardee
> differ in relatively narrow ways -- the bridge descriptor can be adapted to the
> forwardee descriptor with a subset of the adaptations performed by
> MethodHandle::asType . This is adequate for generic and specialization bridges,
> but as we'll see below, we may want to extend this set. Conflicts

> If a class contains a bridge whose forwardee descriptor matches the bridge
> descriptor exactly, the bridge is simply discarded. This decision can be made
> only looking at the forwarding member, since we'll immediately see that the
> member descriptor and the forwarding descriptor are identical. (Such situations
> can arise when a class is specialized with the erasure of its type variables.)

You can also have two forwarding descriptor with no real implementation, by example if you implement twice the same interface with two different type arguments. 
I believe the semantics is exacly the same as the default method semantics. 

> Semantics

> The linkage semantics of forwarding members are different from that of ordinary
> members. When linking a field or method access, if the resolved target is a
> forwarding member, we want to make some adjustments at the invocation site .

> For a getfield that links to a forwarding member, we link the access such that
> it reads the forwardee field, and then adapts the resulting value to the bridge
> field type, and leaves the adapted value on the stack. (This is as if the
> getfield is linked to the result of taking a field getter method handle, and
> adapting it with asType() to the bridge type.) For a putfield , we do the
> reverse; we adapt the new field value to the forwardee type, and write to that
> field.

> If the forwarding member is a method, we re-resolve the method using the
> forwardee signature, adapt its parameters as we would for putfield and its
> return value as we would for getfield , and invoke the forwardee with the
> invocation mode present at the call site . Again, the semantics here are as if
> we took a method handle for the forwardee method, using the invocation mode
> present at the call site, and adapted it with asType to the bridge descriptor.

> The natural interpretation here is that rather than materializing a real field
> or method body in the class, we manage the forwarding as part of the linkage
> process, and include any necessary adaptations at the access site. The bridge
> "body" is never actually invoked; we use the Forwarding metadata to adapt and
> re-link the access sites.

Note: that asType() implied that you can have a method with a varargs that can be forwarded to a method with no varargs (which is a nice way to implement the java.lang.reflect method invocation). 

> Bridge loops

> The linkage strategy outlined above -- where we truly treat bridges as
> forwarding to another member -- is the key to breaking the bridge loops.
> Specifying forwarded members means that the JVM can be aware that two methods
> are, at some level, the same method; the more complex linkage procedure allows
> us to invoke the bridgee with the correct invocation mode all the time, even
> under separate compilation.

> In our Parent / Child example, Child::clone will do an invokespecial to invoke
> Parent::clone()Object , which after recompilation is a bridge to
> Parent::clone()Parent . We'll see that this is a bridge, and will forward to
> Parent::clone()Parent , with an invokespecial , and we'll land in the right
> place.

> The elimination of bridge loops here stems from having raised the level of
> abstraction in which we render the classfile; we record that
> Parent::clone()Object is merely a bridge for Parent::clone()Parent , and so any
> invocation of the former is redirected -- with the same invocation mode -- to
> the latter. It is as if the client knew to invoke the right method.
> User-controlled bridges

> The compiler will generate bridges where the language requires it, but we also
> have the opportunity to enable users to ask for bridges by providing a bridging
> annotation on the declaration:
> @GenerateBridge(returnType=Object.class)
> public static String foo() { ... }

> This will instruct the compiler to generate an Object -returning method that is
> a bridge for foo() . This could be done for either fields or methods. (People
> have written frameworks to do this; see for example [
> http://kohsuke.org/2010/08/07/potd-bridge-method-injector/ |
> http://kohsuke.org/2010/08/07/potd-bridge-method-injector/ ] ). Near-future
> problem: type migration

> This mechanism may also be able to help us deal with the case when we want to
> migrate signatures in an otherwise-incompatible manner, such as changing a
> method that returns int to return long , or an OptionalInt to Optional<int> ,
> or a old-style Date to the newer LocalDate . Numerous library modernizations
> (such as migrating from the old date-time libraries to the JSR-310 versions)
> are blocked on the ability to make such migrations; specializing the core
> libraries (especially Stream ) will also generate such migrations.

> Such migrations are a generalization of the sort of bridges we've been
> discussing here; they involve adding an additional two features:

>    * Additional adaptations, including user-defined adaptations (such as between
>     Date and LocalDate )
>    * Interaction with overriding, so that subclasses that override the old
>     signature can still work properly.

> Projection-embedding pairs

> Given two types T and U , a projection-embedding pair is a pair of functions p :
> T → U and e : U → T such that ∀ u ∈ U p ( e ( u )) = u , and, if t is in the
> range of p , then e ( p ( t )) = t . Examples of useful projection-embedding
> pairs are the value sets of LV and QV for any value class V (we can embed the
> entirety of QV in LV , but LV contains one value -- null -- that can't be
> mapped back), any types T and U where T <: U , int and long (we can embed int
> in long ), and Date and LocalDate . Intuitively, a p-e pair means we can freely
> map back and forth for the embeddable subset, and we get some sort of failure
> (e.g., NPE , or range truncation) otherwise. User-provided adaptations

> Many of the adaptations we want to do are handled by MethodHandle::asType :
> casting, widening, boxing. But sometimes, a migration involves types that
> require user-provided adaptation behavior, such as converting Date to LocalDate
> . (Bridges need to do these in both directions; we use the embedding for reads
> and projection for writes.) Here, we can extend the format of the Forwarding
> attribute to capture this additional behavior as pairs of method handles, such
> as:
> Forwarding {
>     u2 name;
>     u4 length;
>     u2 forwardeeType;
>     // adaptation metadata
>     u1 pePairs;
>     { u2 projection; u2 embedding; }[pePairs];
> }

> When linking an access site for a forwarding member, when an adaptation is not
> supported by MethodHandle::asType , we use the user-provided embedding function
> for adapting return types and field reads, and the projection function for
> adapting parameter types and field writes.

If instead of using filter methods, you use a bootstrap method, you can do all the adaptations you want (at least the one provided by the j.l.i package). And if instead of a forward descriptor, you use a constant method handle (or if you send it as a bootstrap constant argument), you have the semantics of what i've called Mjolnir. Which is the equivalent power of expression of instrinsics described in Java. 

And combined with your user controlled bridge, you have a typesafe macro system. 

> Overriding

> A more complicated problem is when we want to migrate the signature of an
> instance member in a non-final class, because the class may have existing
> subclasses that override the member, and may not yet have been recompiled. For
> example, we might start with:
> interface Collection {
>     int size();
> }

> class ArrayList implements Collection {
>     int size() { return elements.length; }
> }

> Now, we recompile Collection but not ArrayList :
> interface Collection {
>     @TypeMigration(returnType=int.class)
>     long size();
> }

> When we go to load ArrayList , we'll find that it overrides the bridge (
> size()int ), and does not override the real method. We'll want to adjust
> ArrayList as we load it to make up for this.

> Half the problem of this migration is addressed by having a forwarding method
> from size()int to size()long ; any legacy clients that call the old signature
> will be bridged to the new one. To further indicate that overrides of such a
> method should be adjusted, suppose we mark this forwarding bridge with
> ACC_MIGRATED (in reality, we can probably use ACC_FINAL for this). Now, when we
> go to load ArrayList , we'll see that size()int is trying to override a
> migrated method (this is much like the existing override-a-final check).
> Instead of rejecting the subtype, instead we use ACC_MIGRATED bridges as a
> signal to fix up overrides.

> We already have all the information in the Forwarding attribute that we need to
> fix ArrayList::size ; we rewrite the descriptor to the forwardee descriptor,
> use the projection function for adapting argument types, and the embedding
> function for adapting the return type, and install the result in ArrayList . It
> is as if we adapted the subclass method with asType to the forwardee
> descriptor, and installed that in the subclass instead.

> The effect is that in the presence of a migrated bridge, the bridge descriptor
> is a toxic waste zone; callers are redirected to the new descriptor by
> bridging, and overriders are redirected to the new descriptor by adaptation.

If think this kind of adaptation is better done when the vtable is constructed. 
I mean conceptually created because i think that in term of implementation, the VM should insert stub in the vtable that will be resolved lazily the first time the stub is called by invoking the bootstrap method to avoid the initialization issue due to the fact that the vtable is usually created very early in the process. 

Rémi