Draft JVMS changes for Nestmates

Thu Apr 20 18:43:12 UTC 2017

> On Apr 19, 2017, at 7:31 PM, David Holmes <David.Holmes at oracle.com> wrote:
> 
> Hi Dan,
> 
> On 20/04/2017 2:06 AM, Dan Smith wrote:
>>> On Apr 19, 2017, at 12:45 AM, John Rose <john.r.rose at oracle.com
>>> <mailto:john.r.rose at oracle.com>> wrote:
>>> 
>>> - NMs must be non-empty (degenerate nest is never explicit)
>>> - NMs may not contain duplicates (cf. treatment of ClassFile.interfaces)
>>> - NMs may not contain the current class (i.e., an index to a class
>>> with the same name as this_class)
>>> - NMs may contain only package siblings (ditto)
>>> - NMs and MoN may not refer to array classes (this is probably
>>> implied by the package prefix checks)
>> 
>> These are all do-able, but I'm not sure they're consistent with the
>> spirit of JVMS in its treatment of attributes. We don't usually assert
>> that lists are non-empty, don't contain duplicates, etc. (Granted, most
>> attributes are not relevant to JVM execution.) You mention `interfaces`,
>> but I don't see any such assertion in 4.1.
> 
> The NestMembers duplicate check exists because I copied the spec and code from the InnerClasses attribute. 4.7.6 states the classes[] "must have exactly one corresponding entry ..." - hence a check for duplicates is done. I used the same language when I defined NestMembers. So the precedent does exist.

I think you mean this?

"Every CONSTANT_Class_info entry in the constant_pool table which represents a class or interface C that is not a package member must have exactly one corresponding entry in the classes array."

Again, this is compiler spec territory. Which I don't mind tolerating in an attribute definition that is for Java-specific metadata and that is ignored by the JVM. But I'd like to hold critical attributes like NestMembers to a different standard.

(That said, if the consensus is to check NestMembers for duplicates, I can live with it.)

> On Apr 19, 2017, at 9:56 PM, David Holmes <David.Holmes at oracle.com> wrote:
> 
> On 20/04/2017 7:58 AM, Dan Smith wrote:
>> **4.7.25 The `MemberOfNest` Attribute**
>> ---------------------------------------
>> 
>> **The `MemberOfNest` attribute is a fixed-length attribute in the
>> `attributes` table of a `ClassFile` structure ([4.1]).**
>> 
>> **A _nest_ is a set of classes and interfaces that share access to their
>> `private` members ([5.4.4]).**
> 
> Aside: it is this basic definition of "nest" that I expected to appear somewhere earlier in the spec. JVMS doesn't have any upfront "concepts and definitions" section, so not sure where it would go.

Well, there is: Chapter 2. But it doesn't mention accessibility at all, so hard to explain nests. 4.7.25, as the first place where the topic comes up, seems like the next best place.

>> **A class or interface with a `MemberOfNest` attribute belongs to the
>> nest hosted by a designated _host class_. The host class authorizes
>> membership in the nest with a corresponding entry in its `NestMembers`
>> attribute ([4.7.26], [4.10]).**
> 
> Only nit is I would say "claims membership of" rather than "belongs to". That emphasises that membership must be ratified by the "host class", and matches the following "authorizes membership" text.

Somewhere we need a definitive "class C belongs to the nest N" specification. Hence my strong language here.

As you said in the other thread, if I say I belong to a nest, but it turns out I'm not authorized, that's a basic well-formedness error and I get a lethal injection.

Similarly, I can claim to be my own superclass, and we might say "the superclass of C is C", but that class immediately gets killed and anybody who depends on the meaning of "superclass" can assume that it has the desired properties.

> To me the "authorized" is implicit. By calling it out explicitly it suggests to me there are also UnauthorizedNestMembers. But to me this is an oxymoron - if you are unauthorized you are not an unauthorized-member, you are not any kind of member at all.

An unauthorized nest member is one that gets a lethal injection.

> On Apr 19, 2017, at 10:12 PM, David Holmes <David.Holmes at oracle.com> wrote:
> 
>>> 4.10.1.8 Type Checking for Restricted Member References
>>> 
>>> I'm not very good at reading or understanding these rules, but I'm
>>> surprised that the invokespecial rules seem to make no mention of
>>> nestmates at all:
>>> 
>>> "A method reference is allowed by an invokespecial instruction if it
>>> is not a restricted invokespecial reference, ..."
>>> 
>>> This seems to open things up to all private method references??
>> 
>> Well, the whole point of nestmates is to extend, in a regular way,
>> all access rules that pertain to self-access and privacy.
>> If the extension is truly regular, it is not surprising that the spec.
>> changes are subtle like this.  We're not trying to do an ad hoc
>> patch on some set of particular behaviors, but rather extend
>> pre-existing notions so they apply in more conditions, including
>> invokespecial restrictions.
> 
> I was expecting to see the restrictions have a dependency on nestmates actually being involved - which is how I've currently implemented it. I can throw away some of the verifier changes if I only need to check for private access. But I'm unclear exactly where (in the spec) we enforce that only nestmates get this right to use invokespecial on private methods? Is it the access check applied during method resolution?

There are two ways to do this, as we've discovered:

1) As specified, require references to non-private methods to follow the old rules, enforced by a combination of verification of the stack type for select cases (4.10.1.8) and a resolution-time check on the referenced class (6.5.invokespecial). Private invocations are exempted from these rules, and only need to satisfy accessibility (5.4.4).

2) As implemented, allow unrestricted use of invokespecial when the referenced class is a nestmate. All other methods are subject to the old rules.

There are different implications about rights granted by membership in a nest and classes that need to be loaded at verification time. We'll have to explore these implications further...

> On Apr 20, 2017, at 2:21 AM, David Holmes <David.Holmes at oracle.com> wrote:
> 
>>>> MethodHandle resolution
>>>> 
>>>> The spec (5.4.3.5) is vague about what errors can occur during MethodHandle resolution. I assume any linkage error specified for the instruction can also occur via MethodHandle resolution, and that will include failures due to invokespecial improperly referencing a class.
>>> 
>>> This is an area I have least understanding of, but this seems to cover it:
>>> 
>>> "In each step, any exception that can be thrown as a result of failure of resolution of a class or interface or field or method reference can be thrown as a result of failure of method handle resolution."
>> 
>> The ambiguity is in the definition of "resolution". The preferred interpretation, which everyone seems to confirm reflects reality, is that it includes all linkage errors specified for the referencing instruction. I.e., errors from both both 5.4.3 and 6.5 can occur.
> 
> Yes. 5.4.3 states that "Certain of the instructions above require additional linking checks when resolving symbolic references." And that those linking exceptions are listed with the instructions themselves. So yes resolution errors consist of the union thereof.

Ah, great! I had not discovered that. Given the 5.4.3 text, we have an unambiguous definition of resolution that includes the 6.5 content.

>>>> Security risk
>>>> 
>>>> The JEP text should acknowledge that, while this does allow compilers to grant finer-grained access to members shared by nestmates, it also pushes compilers to grant broader access to members that were previously kept private. It's a trade-off, and presumably a good one because nestmates are completely trusted, while package-mates might sometimes be suspect.
>>> 
>>> I'm not following you here. What broader access is being granted to members that were previously kept private? This effort allows private-only access to members that were previously package-accessible.
>> 
>> Old approach is to widen access to private members to the package level on a per-member basis.
>> 
>> New approach is to widen access to private members to the nest level on a per-class basis.
>> 
>> Nest is narrower than package (good!), but per-class is more permissive than per-member (bad). Some members that are compiled as totally private in JDK 9 will be incidentally shared with all nest members in <future JDK version>.
> 
> I'm not sure how you are coming at this. For a private member in 9 to be incidentally shared in <future-JDK> the class of the member has to be added to a nest. With no context as to how this may happen I can't really comment further. From the Java language perspective if the class is in a nest then such access to private members already exists.

Example:

class Outer {
    private int x;
    private int y;
    class Inner { private int z = x; }
}

In JDK 9, this compiles to:

class Outer {
    private int x;
    private int y;
    Outer() {}
    static int access$000(Outer arg) { return x; }
}

class Outer$Inner {
    private int z;
    final Outer this$0;
    Outer$Inner(Outer arg) {
        this$0 = arg;
        z = Outer.access$000(this$0);
    }
}

In <future-JDK>, this compiles to:

class Outer [NestMembers:{Outer$Inner}] {
    private int x;
    private int y;
    Outer() {}
}

class Outer$Inner [MemberOfNest:Outer] {
    private int z;
    final Outer this$0;
    Outer$Inner(Outer arg) {
        this$0 = arg;
        z = this$0.x;
    }
}

Notice that in JDK 9, Outer shares access to y with nobody, while in <future-JDK>, Outer shares access to y with its NestMembers.

Can this expanded access to y be abused? Probably not, because Outer trusts its nestmates. But maybe there's a scenario where someone has ironclad control over the bytecode of Outer and thinks this means their 'y' data is safe, but in <future-JDK> an attacker is able to modify the bytecode of Outer$Inner and thus reach the 'y' data.

>> In an ideal world, JVMS would not acknowledge that the Java Programming Language exists. It should stand on its own, and Java is merely a client.
>> 
>> As is, we have some history that has led to some dependencies on Java in spec, but these are generally auxiliary items: attributes that don't impact JVM behavior (4.7), a tutorial on how to compile Java code (3), occasional explanatory references to Java language concepts (e.g., 2.9).
>> 
>> But it would be wrong for the specification of a "critical" attribute (4.7) to be defined in terms of Java language concepts. It should stand on its own, general enough to meet the needs of Java language compilers and others with different use cases.
> 
> I think you have somewhat rose-coloured glasses on if you think the JVMS is only tainted by the Java language in "auxiliary items". :) The whole set of rules around invokespecial pertain specifically to how it supports the Java language requirements for constructor invocations and super-type invocations. Otherwise invokespecial would have remained as its original invokenonvirtual - which makes more sense to me even now.

_Influenced by_ the Java language, yes, of course. But _dependent on_ the Java language, for example on the idea of nested classes, is something to avoid. If there are concepts we depend on, they should be defined in JVMS, not externally.

>>> In:
>>> 
>>> "If getfield or putfield is used to access a protected field referenced in a superclass ..."
>>> 
>>> and following, you changed "declared" to read "referenced". Again I do not understand what this is supposed to mean. It is the declaration site of the field or method that is needed to determine whether that member is in the same or another package. To me "declared in a superclass ..." was exactly correct. Your note does not make the change any clearer to me.
>> 
>> Good catch. It's important that the rule be limited to cases that reference a superclass*; but it's also important that the declaring class be used for the package check.
>> 
>> (*Here's why: if I reference my subclass, and that resolves to a field declared in my superclass, the verifier will perform no check, because it may not know that the referenced class is a subclass.)
>> 
>> Revised:
>> 
>> "If `getfield` or `putfield` is used to access a `protected` field by reference to a superclass, and that field is declared in a different run-time package than the current class or interface, then the type of the class instance being accessed must be assignable to the current class or interface."
> 
> I still think "declared in" is the correct terminology. There's a nuance to how you are expressing this that I'm just not getting.

Example:

public class p1.A {
    protected int i;
}

public class p2.B extends p1.A {
    int test(p2.C arg) {
        aload_1
        getfield p2.C.i:I
        ireturn
    }
}

public class p2.C extends p2.B {
}

When the verifier checks the getfield, it will apply the rules in 4.10.1.8. The check will trivially pass, because p2.C is not a superclass of the current class (p2.B). There will be no test that p2.C is "assignable to the current class or interface". Thus, p2.C need not be loaded during verification.

Strictly reading the old rule, "field declared in a superclass" means that the verifier would have to load p2.C and resolve (or simulate resolution of) p2.C.i:I in order to determine whether the field is declared in a superclass of the current class (p2.B).

>>> 6.5 invokespecial
>>> 
>>> You note regarding the IllegalAccessError:
>>> 
>>> "This replaces a VerifyError previously specified by 4.9.2. The check must be delayed until after resolution in order to determine whether the referenced method is private."
>>> 
>>> We know the access flags for the referenced method at verification time - shouldn't that be the sole basis for the verifier check? Afterall it is verifying the static properties of the classfile and bytecode. If the actual resolved method has different access to the referenced method then that may lead to ICCE (depending on the exact nature of the change - a private method made public may not be an issue for example).
>> 
>> Example:
>> 
>> invokespecial SomeOtherClass.foo(I)V
>> 
>> Verification relies on the descriptor to tell it that this method expects an int and returns nothing. Those kinds of checks can be performed locally.
>> 
>> The descriptor doesn't tell you if the method is private. Only way to find out is to resolve SomeOtherClass, then resolve "foo(I)V".
>> 
>> If we wanted to do the check at verification time, we'd have to load every class named by an invokespecial instruction.
> 
> Now I am confused. I thought the access modifier of the method described by the Method_Ref was available to us. If it is not then how can all these verification rules be expressed in terms of non-private when we do not have the means at verification time to determine if the access is private or not ???

The rule is:
1) if the referenced method name is not <init> (cheap), and
2) if the referenced class name is the name of a superclass or direct superinterface (cheap—superclass chain is already loaded), and
3) if the loaded referenced class actually is a superclass or direct superinterface (almost always true and cheap, unless there's a name clash), and
4) if the resolved method is not private (pretty cheap, because referenced class is already loaded)
*then* check that the stack type is assignable to the current class type.

This is, essentially, the same logic being employed by the old protected check: if the reference is to a superclass, find the field/method and decide if it's protected; if the reference is to some other class, don't worry about it.

—Dan