Nestmates

Thu Jan 28 16:06:53 UTC 2016

Hello all,

First off, thanks to Brian for inviting me to join this list and for his
earlier introduction.  I am not an expert on Java by any means but I am an
expert on C#; Brian has asked me to comment occasionally on how the issues
you face were resolved (or not resolved!) in C#. I'll pop up now and then.
It's already an interesting discussion!

So, a few comments on how C# deals with these issues, in this email, and
then I'll post a second a little later.

>  In particular, private and protected access from and to inner classes is
stricter in the VM than in the language, meaning that in these cases, the
static compiler emits an access bridge

C# faces a similar, though subtly different problem: the rules in the CLR
for what makes *verifiable* code are in some places stricter than the C#
language would like them to be. The compiler strives to produce verifiable
code whenever possible. The C# compiler solves these problems in the same
way: emitting a helper method to bridge the gap.

C# does not suffer from the *particular* problem you mention; calling a
private method on Foo<int> from inside an activation of Foo<long> is no
problem for either the C# language or the CLR. But let me give you an
example of a mismatch that does occur:

public delegate void Delta();
public class Alpha
{
    public virtual void Echo() { }
}
public class Bravo : Alpha
{
    public override void Echo() { }
    public void Charlie()
    {
        int x = 123;
        Delta foxtrot = () =>
        {
            this.Echo();
            base.Echo();
            System.Console.WriteLine(x);
        };
        foxtrot();
    }
}

Note that the lambda captures both local x and "this". Now, what code
should the compiler emit?  It turns out that the naive way to emit the code
would involve using a feature that the C# language does not have:

    public void Charlie()
    {
        Closure closure = new Closure();
        closure. at this = this;
        closure.x = 123;
        Delta foxtrot = closure.M;
        foxtrot();
    }
    private class Closure
    {
        public Bravo @this;
        public int x;
        public void M()
        {
            @this.Echo(); // Perfectly legal
            // invoke Alpha.Echo() with receiver @this  -- but how?
            System.Console.WriteLine(x);
        }
    }

Now it turns out that the CLR does allow you to generate code that does
non-virtual dispatch to a particular override of a virtual method, but the
*verifier* requires that such code appear in *a class equal to or derived
from the class that declares the method being invoked*.

Our class Closure here is not considered by the verifier to have the right
to call Alpha.Echo non-virtually because Closure is not Bravo, it is merely
*nested* in Bravo. From the point of view of the compiler writer, this is a
crazy restriction; if Bravo has the right to do a call then why shouldn't a
class entirely owned by Bravo? But that's the rule in the verifier.

Early versions of C# simply generated non-verifiable code and produced a
warning that this pattern led to non-verifiable code. Later versions
generated a private bridge method in Bravo that called Alpha.Echo and then
had the generated closure invoke that helper method.

Cheers,
Eric

On Wed, Jan 20, 2016 at 11:56 AM, Brian Goetz <brian.goetz at oracle.com>
wrote:

> This topic is at the complete opposite end of the spectrum from topics
> we've been discussing so far.  It's mostly an implementation story, and of
> particular interest to the compiler and VM implementers here.
>
>
> Background
> ----------
>
> Since Java 1.1, the rules for accessibility when inner classes are
> involved at the language level are not fully aligned with those at the VM
> level.  In particular, private and protected access from and to inner
> classes is stricter in the VM than in the language, meaning that in these
> cases, the static compiler emits an access bridge (access$000) which
> effectively downgrades the accessed member's accessibility to package.
>
> Access bridges have some disadvantages.  They're ugly, but that's not a
> really big deal.  They're imprecise; they allow wider-than-necessary access
> to the member.  Again, this is not a huge deal on its own.  But the real
> problem is the complexity of the compiler implementation when we add
> generic specialization to the story.
>
> Specialization adds a new category of cross-class accesses that are
> allowed at the language level but not at the VM level, which would
> dramatically increase the need for, and complexity of, accessibility
> bridges.  For example:
>
> class Foo<any T> {
>     private T t;
>
>     void m(Foo<int> foo) {
>         int i = foo.t;
>     }
> }
>
> Now we execute:
>
>     Foo<long> fl = ...
>     Foo<int> fi = ...
>     fl.m(fi)
>
> The spirit of the language rules clearly allow the access from Foo<long>
> to Foo<int>.t -- they are in the "same class".  But at the VM level,
> Foo<int> and Foo<long> are different classes, so the access from Foo<long>
> to a private member of Foo<int> is disallowed.
>
> One reason that this increases the complexity, and not just the number, of
> accessibility bridges is that bridges are (currently) static methods; if
> they represent instance methods, we pass the receiver as the first
> argument.  For access between inner classes, this is fine, but when it
> comes to access between specializations, this breeds new complexity --
> because the method signature of the accessor needs to be specialized based
> on the type parameters of the receiver.  This interaction means the current
> static-accessor solution would need its own special, ad-hoc treatment in
> specialization, adding to the complexity of specialization.
>
> More generally, this situation arises in any case where a single logical
> unit of encapsulation at the source level is split into multiple runtime
> classes (inner classes, specialization classes, synthetic helper classes.)
> We propose to address this problem more generally, by providing a mechanism
> where language compilers can indicate that multiple runtime classes live in
> the same unit of encapsulation.  We do so by (a) adding metadata to classes
> to indicate which classes belong in the same encapsulation unit and (b)
> relaxing some VM accessibility rules to bring them more in alignment with
> the language level rules.
>
>
> Overview
> --------
>
> Our proposed strategy is to reify the relationship between classes that
> are members of the same _nest_.  Nestmate-ness can then be considered in
> access control decisions (JVMS 5.4.4).
>
> Classes that derive from a common source class form a _nest_, and two
> classes in the same nest are called _nestmates_.  Nestmate-ness is an
> equivalence relation (reflexive, symmetric, and transitive.)  Nestmates of
> a class C include C's inner classes, synthetic classes generated as part of
> translating C, and specializations thereof.
>
> Since nestmate-ness is an equivalence relation, it forms a partition over
> classes, and we can nominate a canonical member for each partition.  We
> nominate the "top" (outermost lexically enclosing) class in the nest as the
> canonical member; this is the top-level source class from which all other
> nestmates derive.
>
> This makes it easy to calculate nestmate-ness for two classes C and D; C
> and D are nestmates if their "top" class is the same.
>
> Example
> -------
>
> class Top<any T> {
>     class A<any U> { }
>         class B<V> { }
>     }
>
>     <any T> void genericMethod() { }
> }
>
> When we compile this, we get:
>    Top.class                   // Top
>    Top$A.class                 // Inner class Top.A
>    Top$A$B.class               // Inner class Top.A.B
>    Top$Any.class               // Wildcard interface for Top
>    Top$A$Any.class             // Wildcard interface for Top.A
>    Top$genericMethod.class     // Holder class for generic method
>
> The explicit classes Top, Top.A, and Top.A.B, the synthetic $Any classes,
> and the synthetic holder class for genericMethod, along with all of their
> specializations, form a nest.  The top member of this nest is Top.
>
> Since nestmates all derive from a common top-level class, they are by
> definition in the same package and module.  A class can be in only one nest
> at once.
>
>
> Runtime Representation
> ----------------------
>
> We represent nestmate-ness with two new attributes -- one in the top
> member, which describes all the members of the nest, and one in each
> member, which requests access to the nest.
>
>     NestTop {
>         u2 name_index;
>         u4 length;
>         u2 child_count;
>         u2 childClazz[child_count];
>     }
>
>     NestChild {
>         u2 name_index;
>         u4 length;
>         u2 topClazz;
>     }
>
> If a class has a NestTop attribute, its nest top is itself. If a class has
> a NestChild attribute, its nest top is the class named via topClazz. If a
> class is a specialization of another class, its nest top is the nest top of
> the class for which it is a specialization.
>
> When loading a class with a NestChild attribute, the VM can verify that
> the requested nest permits it as a member, and reject the class if the
> child and top do not agree.
>
> The NestTop attribute can enumerate all inner classes and synthetic
> classes, but cannot enumerate all specializations thereof. When creating a
> specialization of a class, the VM records the specialization as being a
> member of whatever nest the template class was a member of.
>
>
> Semantics
> ---------
>
> The accessibility rules here are strictly additions; nestmate-ness creates
> additional accessibility over and above the existing rules.
>
> Informally:
>   - A class can access the private members of its nestmates;
>   - A class can access protected members inherited by its nestmates.
>
> This is slightly broader than the language semantics (but still less broad
> than what we do today with access bridges.)  The static compiler can
> continue to enforce the same rules, and the VM will allow these accesses
> without bridges.  (We could make the proposal match the language semantics
> more closely at the cost of additional complexity, but its not clear this
> is worthwhile.)
>
> For private access, we can add the following to 5.4.4:
>   - A class C may access a private member D.R if C and D are nestmates.
>
> The rules for protected members are more complicated.  5.4.3.{2,3} first
> resolve the true owner of the member, and feed that to 5.4.4; this process
> throws away some needed information.  We would augment 5.4.3.{2,3} as
> follows:
>  - When performing member resolution from class C on member D.R, we
> remember both D (the target class) and E (the resolved class) and make them
> both available to 5.4.4.
>
> We then adjust 5.4.4 accordingly, by adding:
>  - If R is protected, and C and D are nestmates, and E is accessible to D,
> then access is allowed.
>
>
> Examples
> --------
>
> For private fields, we generate access bridges whenever an inner class
> accesses a private member (field or method) of the enclosing class, or of
> another inner class in the same nest.
>
> In the classes below, the accesses shown are all permitted by the language
> spec (child to parent, sibling to sibling, sibling to child of sibling,
> etc), and the ones requiring access bridges are noted.
>
>     class Foo {
>         public static Foo aFoo;
>         public static Inner1 aInner1;
>         public static Inner1.Inner2 aInner2;
>         public static Inner3 aInner3;
>
>         private int foo;
>
>         class Inner1 {
>             private int inner1;
>
>             class Inner2 {
>                 private int inner2;
>             }
>
>             void m() {
>                 int i = aFoo.foo           // bridge
>                       + aInner1.inner1
>                       + aInner2.inner2     // bridge
>                       + aInner3.inner3;    // bridge
>             }
>         }
>
>         class Inner3 {
>             private int inner3;
>
>             void m() {
>                 int i = aFoo.foo           // bridge
>                       + aInner1.inner1     // bridge
>                       + aInner2.inner2     // bridge
>                       + aInner3.inner3;
>             }
>         }
>     }
>
> For protected members, the situation is more subtle.
>
>     /* package p1 */
>     public class Sup {
>         protected int pro;
>     }
>
>     /* package p2 */
>     public class Sub extends p1.Sup {
>         void test() {
>             ... pro ... //no bridge (invokespecial)
>         }
>
>         class Inner {
>             void test() {
>                 ... sub.pro ... // bridge generated in Sub
>             }
>         }
>     }
>
> Here, the VM rules allow Sub to access protected members of Sup, but for
> accesses from Sub.Inner or Sibling to Sub.pro to succeed, Sub provides an
> access bridge (which effectively makes Sub.pro package-visible throughout
> package p2.)
>
> The rules outlined eliminate access bridges in all of these cases.
>
>
> Interaction with defineAnonymousClass
> -------------------------------------
>
> Nestmate-ness also potentially connects nicely with
> Unsafe.defineAnonymousClass.  The intuitive notion of dAC is, when you load
> anonymous class C with a host class of H, that C is being "injected into" H
> -- access control decisions for C are made using H's credentials.  With a
> formal notion of nestmateness, we can bring additional predictability to
> dAC by saying that C is injected into H's nest.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20160128/802310d7/attachment-0001.html>