RI update: division of bridging responsibility between VM and compiler

Brian Goetz brian.goetz at oracle.com
Mon Apr 15 09:52:33 PDT 2013


As you may recall, adding default methods requires that the VM get 
involved in default method inheritance, because it is an explicit goal 
for the addition of an interface method with a default to be a 
binary-compatible change.  We've had an implementation of default 
inheritance in the VM for quite a while.  The basic inheritance 
algorithm was really easy to implement; it built on top of existing 
vtable building in a straightforward and well-defined way.

Some time back, we identified some cases where pushing default 
inheritance into the VM seemed to necessitate pushing bridge method 
generation into the VM as well.  We also have had an implementation of 
this in the VM for a while too.  But, this is a much bigger change and 
we're not as comfortable with it -- it pushes the details of the generic 
type system into the VM, and risks exposing Java-language-specific type 
system details to classes generated by other language compilers.

At one point, we were convinced we had no choice.  But since then, there 
were some simplifications in the definition of overriding with respect 
to defaults (specifically, outlawing abstract-default conflicts rather 
than silently merging them), and it turns out that this eliminates a 
number of the examples that led us to believe we had no choice in this 
matter.  (Specifically, to land in a corner case, it now requires a 
bridge-requiring merge between a class and an interface; can't happen 
any more with two interfaces.)  After having spent some time trying to 
specify what the invoke{virtual,interface,special} semantics might be in 
a VM-bridged world -- with the hopes that this would be step 1 along the 
path of eventually moving all bridging out of the static compiler (where 
it clearly does not belong, and is basically pure technical debt left 
over from generics) -- we're getting more comfortable with the corner 
cases that we'd have without VM bridging. Indeed, most of them are 
analogous to corner cases we already have today and would continue to 
have tomorrow under separate compilation with ordinary classes.

Instead, we're now pursuing a path where we generate bridges into 
interfaces (since we can do that now) using an algorithm very similar to 
what we do with class bridges.  We may need to extend the technique of 
compiler-generated bridges with generating additional classfile 
attributes that the VM might act on to avoid these anomalies, currently 
being explored.

This offers a significant reduction in complexity.  We can rip out all 
existing bridge-related code from VM, and do default inheritance using 
the simple "same erased signature" overriding the VM has always done. 
Can rip out all generic analysis, including verification of generic 
signatures.  Though might have to add back processing of additional 
classfile attributes and potentially use those to modify the behavior of 
inheritance, details TBD.  And, this keeps the generic type system in 
javac, eliminating risks of interference with other language inheritance 
semantics.


BRIEF NOTATION BREAK
--------------------

When we were discussing how to specify default inheritance, we invented 
a notation where we wrote things like:

   Cc(Id(Ja))

and wrote separate compilation examples as:

   Cc(Id(Ja)) -> Cc(Id(Jd))

Which was much easier to reason about, and less ambiguity-prone, than 
writing the classes out longhand.  Decoder chart:

A, B: concrete or abstract classes
C: concrete class to be instantiated
I, J, K: interfaces

In this world, like in FD, there's one method, named "m", with no 
arguments.  Classes or interfaces have some extra letters after them to 
describe how m is declared:

C -- no declaration of m
Cc -- m() declared in C as concrete
Ca, Ia -- m() declared in C or I as abstract
Id -- m() declared in I as default
Cm -- m() is declared in C as either abstract or concrete

We now extend this notation with indicators describing covariant 
overrides, imagining a linear hierarchy of types T2 <: T1 <: T0:

Cc0 -- m() declared in C as returning T0
Cc1 -- m() declared in C as returning T1

Supertypes are written in parentheses:

Cc(Id(Jd))

means that C extends I and and I extends J.

Separate compilation is written as:

Cc(Id(Ja)) -> Cc(Id(Jd))

Since only J is changed, only J is assumed to be recompiled.


MOTIVATING EXAMPLE
------------------

Here's a problem we have today (and which the path we'd been pursuing 
would not have fixed for 8):

Cc1(A) -> Cc1(Ac0)

(This is a "contravariant underride.")  This means we go from:

abstract class A { }
class C <: A { T1 m() { } }

to

abstract class A { T0 m() { } }
class C <: A { T1 m() { } }

without recompiling C.

What will happen at runtime is:

m()T1 -> C
m()T0 -> A

whereas with a global recompile, we would get:

m()T1 -> C
m()T0 -> C

Note that:
  - This problem exists today and has existed since Java 5
  - Would get no better under the "default VM bridging" plan
  - No one seems particularly bothered by this long-standing issue.


Now consider the defender analogue of this example:

Cc1(I) -> Cc1(Id0)

m()T1 -> C
m()T0 -> I

Is this any worse than the previous version?  For default methods, we 
say "classes that don't override this method will get the default, which 
by definition meets the contract of I."  A moldy class file that had no 
idea that it's m()T1 declaration was overriding an as-yet-unborn m()T0 
in a supertype could well be described as "not overriding the method." 
In which case they get the default.  This does not seem so bad, or any 
worse than many other similar separate compilation scenarios today.

Turning it around, if we handled this case but not the class-based 
version of the same issue, might that not even be weirder?

Note also that with the decision to rule out abstract-default conflicts 
(i.e., outlawing K(Ia,Jd)), the set of possible bad cases is reduced a 
lot; many of the scary examples came from that space.


INTERFACE BRIDGES
-----------------

We anticipate that (consistently compiled) interface hierarchies like

   Id1(Jd0)

will be common.  (Consider a method like Collection.immutable(), which 
might be covariantly overridden by List.immutable()).  So, to support 
consistently compiled hierarchies like this (that is, I and J updated 
together) without forcing a recompile of concrete classes implementing 
I, the compiler could generate a bridge in I redirecting m()T0 to m()T1, 
with suitable cast, which is the highest point in the hierarchy where we 
can determine a bridge is needed.  In a consistently compiled world, 
this is all that is needed.

But we don't live in a consistently compiled world.  So we must make 
some allowance for what might happen in a separately compiled world. The 
current scheme of only compiling bridges into the class where the 
bridgee lives helps reduce certain separate compilation artifacts.  I 
think we should probably continue doing this, so that class bridges 
will, at times, override interface bridges.  There does not seem to be 
harm in this, and it changes fewer things, and eliminates some risk vectors.

(Ultimately the problem is that compiler bridges suffer from "premature 
bytecode".  When the compiler generates a bridge, it is trying to reify 
the notion of "method m()T1 was known to override method m()T0 at 
compile time", but this is opaque to the VM, who can only slavishly 
propagate the bridge through subclass vtables as if it were code written 
by the user.  If, instead of bridges (or in addition to), the compiler 
instead generated a class attribute of the form "I believe that m()T1 
overrides m()T0", the VM could act on that information directly, and 
this might buy us out of some of the worst possible problems.)


WORST CASE SCENARIO
-------------------

The cases above are not terrible because the program continues to link 
after separation compilation and even does something vaguely 
justifiable.  Here's a worse scenario (relevant humor break: 
http://www.youtube.com/watch?v=_W-qxpN2oEI).

   Cc1(Bc0(Ac0)) -> Cc1(Bc1(Ac0))

If the implementation in C does:

   super.m()

one gets a StackOverflowError.  This happens because when we invoke 
C.m(), we are really invoking C.m()T1.  C.m()T1 invokes B.m()T0 via 
invokespecial, thinking that it is invoking the parent implementation. 
But really B.m()T0 is a bridge for B.m()T1, that invokes B.m()T1 with 
invokevirtual.  But B.m()T1 is overridden by C.m()T1, and so the 
invokevirtual is dispatched there.  Which is where we started, so we 
ping-pong between C.m()T1 and B.m()T1 until we fall off the stack.

Again, note that (a) we already have this problem since Java 5 and (b) 
the complex solution we were pursuing would not have fixed it for 8. 
But this is definitely worse than the problems above, and we want to not 
widen this hole.

We need to explore further what kinds of separate compilation anomalies 
with bridges in interfaces might cause similar problems.


EXHAUSTIVE PATTERN CATALOG
--------------------------

Dan did a nearly-exhaustive catalog of inheritance scenarios.  The 
question is, do we find any of these anomalies so bad (worse than 
existing anomalies) that we cannot live with them?  On review, none of 
them seem any worse than the pain of bridge methods under separate 
compilation we've been living with for years.

They are annotated with what happens:

0: Description of the behavior of an invocation on an instance of C, 
targeting the descriptor of index 0.
0*: Behavior inconsistent with a full compilation of the final state.

This following cases are not considered:
- Illegal hierarchies, in either the initial or final state
- Redundant extra classes/interfaces that have no effect on the outcome
- Redundant permutations of 'implements' clauses
- Final states that require recompiling C

=====
Linear inheritance (one ancestor, two methods)

---

Cc1(A) -> Cc1(Ac0)

0*: Inherited from A
1: Declared in C

---

Cc1(I) -> Cc1(Id0)

0*: Inherited default from I
1: Declared in C

=====
Linear inheritance (two ancestors, no method in C)

---

C(Bc1(A)) -> C(Bc1(Ac0))

0*: Inherited from A
1: Inherited from B

---

C(B(Ac0)) -> C(Bc1(Ac0))

0: Inherited bridge from B
1: Inherited from B

---

C(B(A)) -> C(Bc1(Ac0))

0: Inherited bridge from B
1: Inherited from B

---

C(Ac1(I)) -> C(Ac1(Id0))

0*: Inherited default from I
1: Inherited from A

---

C(A(Id0)) -> C(Ac1(Id0))

0: Inherited bridge from A
1: Inherited from A

---

C(A(I)) -> C(Ac1(Id0))

0: Inherited bridge from A
1: Inherited from A

---

C(Id1(J)) -> C(Id1(Jd0))

0*: Inherited default from J
1: Inherited default from I

---

C(I(Jd0)) -> C(Id1(Jd0))

0: Inherited bridge from I
1: Inherited default from I

---

C(I(J)) -> C(Id1(Jd0))

0: Inherited bridge from I
1: Inherited default from I

=====
Linear inheritance (two ancestors, method in C)

---

Cc2(B(Am0)) -> Cc2(Bc1(Am0))

0: Bridge in C
1*: Inherited from B
2: Declared in C

---

Cc2(Bm1(A)) -> Cc2(Bm1(Ac0))

0*: Inherited from A
1: Bridge in C
2: Declared in C

---

Cc2(B(A)) -> Cc2(Bc1(Ac0))

0*: Inherited bridge from B
1*: Inherited from B
2: Declared in C

---

Cc2(A(Im0)) -> Cc2(Ac1(Im0))

0: Bridge in C
1*: Inherited from A
2: Declared in C

---

Cc2(Am1(I)) -> Cc2(Am1(Id0))

0*: Inherited from I
1: Bridge in C
2: Declared in C

---

Cc2(A(I)) -> Cc2(Ac1(Id0))

0*: Inherited bridge from A
1*: Inherited from A
2: Declared in C

---

Cc2(J(Im0)) -> Cc2(Jd1(Im0))

0: Bridge in C
1*: Inherited default from J
2: Declared in C

---

Cc2(Jm1(I)) -> Cc2(Jm1(Id0))

0*: Inherited default from I
1: Bridge in C
2: Declared in C

---

Cc2(J(I)) -> Cc2(Jd1(Id0))

0*: Inherited bridge from J
1*: Inherited default from J
2: Declared in C

=====
Independent branches (no method in C)

---

C(Ac1, I) -> C(Ac1, Id0)

0*: Inherited default from I
1: Inherited from A

---

C(A, Id0) -> C(Ac1, Id0)

0*: Inherited default from I
1: Inherited from A

---

C(A, I) -> C(Ac1, Id0)

0*: Inherited default from I
1: Inherited from A

=====
Independent branches (method in C)

---

Cc2(Am0, I) -> Cc2(Am0, Id1)

0: Bridge in C
1*: Inherited default from I
2: Declared in C

---

Cc2(A, Im1) -> Cc2(Ac0, Im1)

0*: Inherited from A
1: Bridge in C
2: Declared in C

---

Cc2(A, I) -> Cc2(Ac0, Id1)

0*: Inherited from A
1*: Inherited default from I
2: Declared in C

---

Cc2(Im0, J) -> Cc2(Im0, Jd1)

0: Bridge in C
1*: Inherited default from J
2: Declared in C

---

Cc2(I, J) -> Cc2(Id0, Jd1)

0*: Inherited default from I
1*: Inherited default from J
2: Declared in C

=====
Diamond branches (no method in C)

---

C(A(Id0), J(Id0)) -> C(Ac1(Id0), J(Id0))

0: Inherited bridge from A
1: Inherited from A

---

C(A(Id0), J(Id0)) -> C(A(Id0), Jd1(Id0))

0: Inherited bridge from J
1: Inherited default from J

---

C(A(Id0), J(Id0)) -> C(Ac2(Id0), Jd1(Id0))

0: Inherited bridge from A (beats new bridge in J)
1*: Inherited default from J
2: Inherited from A

---

C(J(Id0), K(Id0)) -> C(Jd1(Id0), K(Id0))

0: Inherited bridge from J
1: Inherited default from J

---

C(Ac2(Im0), J(Im0)) -> C(Ac2(Im0), Jd1(Im0))

0: Inherited bridge from A (beats new bridge in J)
1*: Inherited default from J
2: Inherited from A

---

C(A(Im0), Jd1(Im0)) -> C(Ac2(Im0), Jd1(Im0))

0: Inherited bridge from A (beats old bridge in J)
1*: Inherited default from J
2: Inherited from A

=====
Diamond branches (method in C)

---

Cc2(A(Im0), J(Im0)) -> Cc2(Ac1(Im0), J(Im0))

0: Bridge in C
1*: Inherited from A
2: Declared in C

---

Cc2(A(Im0), J(Im0)) -> Cc2(A(Im0), Jd1(Im0))

0: Bridge in C
1*: Inherited default from J
2: Declared in C

---

Cc3(A(Im0), J(Im0)) -> Cc3(Ac1(Im0), Jd2(Im0))

0: Bridge in C
1*: Inherited from A
2*: Inherited default from J
3: Declared in C

---

Cc2(J(Im0), K(Im0)) -> Cc2(Jd1(Im0), K(Im0))

0: Bridge in C
1*: Inherited default from J
2: Declared in C

---

Cc3(J(Im0), K(Im0)) -> Cc3(Jd1(Im0), Kd2(Im0))

0: Bridge in C
1*: Inherited default from J
2*: Inherited default from K
3: Declared in C

---

Cc3(Am1(Im0), J(Im0)) -> Cc3(Am1(Im0), Jd2(Im0))

0: Bridge in C
1: Bridge in C
2*: Inherited default from J
3: Declared in C

---

Cc3(A(Im0), Jm2(Im0)) -> Cc3(Ac1(Im0), Jm2(Im0))

0: Bridge in C
1*: Inherited from A
2: Bridge in C
3: Declared in C

---

Cc3(Jm1(Im0), K(Im0)) -> Cc3(Jm1(Im0), Kd2(Im0))

0: Bridge in C
1: Bridge in C
2*: Inherited default from K
3: Declared in C


More information about the lambda-spec-observers mailing list