Nestmates

Wed Jan 20 19:56:22 UTC 2016

This topic is at the complete opposite end of the spectrum from topics 
we've been discussing so far.  It's mostly an implementation story, and 
of particular interest to the compiler and VM implementers here.

Background
----------

Since Java 1.1, the rules for accessibility when inner classes are 
involved at the language level are not fully aligned with those at the 
VM level.  In particular, private and protected access from and to inner 
classes is stricter in the VM than in the language, meaning that in 
these cases, the static compiler emits an access bridge (access$000) 
which effectively downgrades the accessed member's accessibility to 
package.

Access bridges have some disadvantages.  They're ugly, but that's not a 
really big deal.  They're imprecise; they allow wider-than-necessary 
access to the member.  Again, this is not a huge deal on its own.  But 
the real problem is the complexity of the compiler implementation when 
we add generic specialization to the story.

Specialization adds a new category of cross-class accesses that are 
allowed at the language level but not at the VM level, which would 
dramatically increase the need for, and complexity of, accessibility 
bridges.  For example:

class Foo<any T> {
     private T t;

     void m(Foo<int> foo) {
         int i = foo.t;
     }
}

Now we execute:

     Foo<long> fl = ...
     Foo<int> fi = ...
     fl.m(fi)

The spirit of the language rules clearly allow the access from Foo<long> 
to Foo<int>.t -- they are in the "same class".  But at the VM level, 
Foo<int> and Foo<long> are different classes, so the access from 
Foo<long> to a private member of Foo<int> is disallowed.

One reason that this increases the complexity, and not just the number, 
of accessibility bridges is that bridges are (currently) static methods; 
if they represent instance methods, we pass the receiver as the first 
argument.  For access between inner classes, this is fine, but when it 
comes to access between specializations, this breeds new complexity -- 
because the method signature of the accessor needs to be specialized 
based on the type parameters of the receiver.  This interaction means 
the current static-accessor solution would need its own special, ad-hoc 
treatment in specialization, adding to the complexity of specialization.

More generally, this situation arises in any case where a single logical 
unit of encapsulation at the source level is split into multiple runtime 
classes (inner classes, specialization classes, synthetic helper 
classes.)  We propose to address this problem more generally, by 
providing a mechanism where language compilers can indicate that 
multiple runtime classes live in the same unit of encapsulation.  We do 
so by (a) adding metadata to classes to indicate which classes belong in 
the same encapsulation unit and (b) relaxing some VM accessibility rules 
to bring them more in alignment with the language level rules.

Overview
--------

Our proposed strategy is to reify the relationship between classes that 
are members of the same _nest_.  Nestmate-ness can then be considered in 
access control decisions (JVMS 5.4.4).

Classes that derive from a common source class form a _nest_, and two 
classes in the same nest are called _nestmates_.  Nestmate-ness is an 
equivalence relation (reflexive, symmetric, and transitive.)  Nestmates 
of a class C include C's inner classes, synthetic classes generated as 
part of translating C, and specializations thereof.

Since nestmate-ness is an equivalence relation, it forms a partition 
over classes, and we can nominate a canonical member for each partition. 
  We nominate the "top" (outermost lexically enclosing) class in the 
nest as the canonical member; this is the top-level source class from 
which all other nestmates derive.

This makes it easy to calculate nestmate-ness for two classes C and D; C 
and D are nestmates if their "top" class is the same.

Example
-------

class Top<any T> {
     class A<any U> { }
         class B<V> { }
     }

     <any T> void genericMethod() { }
}

When we compile this, we get:
    Top.class                   // Top
    Top$A.class                 // Inner class Top.A
    Top$A$B.class               // Inner class Top.A.B
    Top$Any.class               // Wildcard interface for Top
    Top$A$Any.class             // Wildcard interface for Top.A
    Top$genericMethod.class     // Holder class for generic method

The explicit classes Top, Top.A, and Top.A.B, the synthetic $Any 
classes, and the synthetic holder class for genericMethod, along with 
all of their specializations, form a nest.  The top member of this nest 
is Top.

Since nestmates all derive from a common top-level class, they are by 
definition in the same package and module.  A class can be in only one 
nest at once.

Runtime Representation
----------------------

We represent nestmate-ness with two new attributes -- one in the top 
member, which describes all the members of the nest, and one in each 
member, which requests access to the nest.

     NestTop {
         u2 name_index;
         u4 length;
         u2 child_count;
         u2 childClazz[child_count];
     }

     NestChild {
         u2 name_index;
         u4 length;
         u2 topClazz;
     }

If a class has a NestTop attribute, its nest top is itself. If a class 
has a NestChild attribute, its nest top is the class named via topClazz. 
If a class is a specialization of another class, its nest top is the 
nest top of the class for which it is a specialization.

When loading a class with a NestChild attribute, the VM can verify that 
the requested nest permits it as a member, and reject the class if the 
child and top do not agree.

The NestTop attribute can enumerate all inner classes and synthetic 
classes, but cannot enumerate all specializations thereof. When creating 
a specialization of a class, the VM records the specialization as being 
a member of whatever nest the template class was a member of.

Semantics
---------

The accessibility rules here are strictly additions; nestmate-ness 
creates additional accessibility over and above the existing rules.

Informally:
   - A class can access the private members of its nestmates;
   - A class can access protected members inherited by its nestmates.

This is slightly broader than the language semantics (but still less 
broad than what we do today with access bridges.)  The static compiler 
can continue to enforce the same rules, and the VM will allow these 
accesses without bridges.  (We could make the proposal match the 
language semantics more closely at the cost of additional complexity, 
but its not clear this is worthwhile.)

For private access, we can add the following to 5.4.4:
   - A class C may access a private member D.R if C and D are nestmates.

The rules for protected members are more complicated.  5.4.3.{2,3} first 
resolve the true owner of the member, and feed that to 5.4.4; this 
process throws away some needed information.  We would augment 
5.4.3.{2,3} as follows:
  - When performing member resolution from class C on member D.R, we 
remember both D (the target class) and E (the resolved class) and make 
them both available to 5.4.4.

We then adjust 5.4.4 accordingly, by adding:
  - If R is protected, and C and D are nestmates, and E is accessible to 
D, then access is allowed.

Examples
--------

For private fields, we generate access bridges whenever an inner class 
accesses a private member (field or method) of the enclosing class, or 
of another inner class in the same nest.

In the classes below, the accesses shown are all permitted by the 
language spec (child to parent, sibling to sibling, sibling to child of 
sibling, etc), and the ones requiring access bridges are noted.

     class Foo {
         public static Foo aFoo;
         public static Inner1 aInner1;
         public static Inner1.Inner2 aInner2;
         public static Inner3 aInner3;

         private int foo;

         class Inner1 {
             private int inner1;

             class Inner2 {
                 private int inner2;
             }

             void m() {
                 int i = aFoo.foo           // bridge
                       + aInner1.inner1
                       + aInner2.inner2     // bridge
                       + aInner3.inner3;    // bridge
             }
         }

         class Inner3 {
             private int inner3;

             void m() {
                 int i = aFoo.foo           // bridge
                       + aInner1.inner1     // bridge
                       + aInner2.inner2     // bridge
                       + aInner3.inner3;
             }
         }
     }

For protected members, the situation is more subtle.

     /* package p1 */
     public class Sup {
         protected int pro;
     }

     /* package p2 */
     public class Sub extends p1.Sup {
         void test() {
             ... pro ... //no bridge (invokespecial)
         }

         class Inner {
             void test() {
                 ... sub.pro ... // bridge generated in Sub
             }
         }
     }

Here, the VM rules allow Sub to access protected members of Sup, but for 
accesses from Sub.Inner or Sibling to Sub.pro to succeed, Sub provides 
an access bridge (which effectively makes Sub.pro package-visible 
throughout package p2.)

The rules outlined eliminate access bridges in all of these cases.

Interaction with defineAnonymousClass
-------------------------------------

Nestmate-ness also potentially connects nicely with 
Unsafe.defineAnonymousClass.  The intuitive notion of dAC is, when you 
load anonymous class C with a host class of H, that C is being "injected 
into" H -- access control decisions for C are made using H's 
credentials.  With a formal notion of nestmateness, we can bring 
additional predictability to dAC by saying that C is injected into H's 
nest.