Classfile artifacts to support compilation/reflection

Mon May 3 21:58:42 UTC 2021

> On Apr 28, 2021, at 2:12 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> I'm updating the SoV documents and it raises a few questions about what classfile surface we need for capturing the language model.  The good news is that with the single-classfile model, the translation complexity collapses almost to zero.  But there are a few questions of "what do we retain in the classfile."  
> 
> 1.  Ref-favoring vs val-favoring.  Whether a primitive class P is ref-favoring or val-favoring no longer affects translation of the classfile (yay), it only affects translation of _type uses_ of the unadorned name.  But, this has to be capture somewhere in the classfile, so that the compiler can read in P.class and know what the name `P` means.  There are a few choices here:
> 
>  - An ACC_ bit.  Meh, these are pretty expensive.  
>  - An attribute, which only javac and reflection would need to pay attention to.
>  - A supertype (implements RefFavoring).  
> 
> My preference is an attribute; this feels closest to `Signature` to me.  Reflection might want to reflect the ref-favoring bit.

Agree. Note that this is a Java-language-only attribute, irrelevant to the JVM.

> 2.  Whether abstract classes are primitive superclass candidates.  The static compiler will check this at compilation time when it sees a superclass of a primitive class, but the JVM will want to recheck anyway.  There are two sensible ways to handle this in the classfile:
> 
>  - An attribute that says "I am a primitive superclass candidate."  The static compiler puts it there, and the JVM         checks it.  
>  - Infer and tag.  If an abstract class is loaded that is not a primitive superclass candidate, the JVM injects IdentityObject as a superinterface of the newly loaded class; when we go to load a primitive subclass, this will fail because primitive classes cannot implement both IdentityObject and PrimitiveObject.  
> 
> Reflection probably doesn't have to reflect whether a class is primitive superclass candidate; it already reflects the things needed to make this determination.  

This one, on the other hand, conveys a core property of a JVM class.

To review the story for the JVM, it goes like this: there are two channels for subclass instance creation. The first channel, for identity subclasses, is the mutation-based '<init>' route, with subclasses required by the verifier to call up the '<init>' chain (and supply appropriate arguments) to get usable class instances. '<init>' declarations can limit access. The second channel, for primitive subclasses, is an opt-in flag in the parent that says "I'll allow primitive children without performing any initialization", and then 'defaultvalue' and 'withfield' can just ignore the superclass. This route also needs accessibility restrictions.

Here's how the *language behavior* of abstract classes is defined in JEP 401:

"An interface can explicitly extend either IdentityObject or PrimitiveObject if the author determines that all implementing objects are expected to have or not have identity. It is an error if a class ends up implementing both interfaces implicitly, explicitly, or by inheritance. By default, an interface extends neither of these interfaces and can be implemented by both kinds of concrete classes."

"An abstract class can similarly be declared to implement either IdentityObject or PrimitiveObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject. Otherwise, it extends neither interface and can be extended by both kinds of concrete classes."

We decided that detecting a "non-empty constructor" is not a job the JVM should be expected to perform, and so the JVM needs an explicit signal for the second-channel flavor of instance creation. (And note, BTW, that the first channel *also* has an explicit opt-in in the JVM, even though there's an implicit constructor in the language. You can declare a JVM class without instance creation support.) javac is responsible for generating that opt-in signal; legacy classes don't get it until recompilation.

Given that signal, the JVM can do some error checks (again quoting JEP 401):

"An abstract class that allows primitive subclasses declares this capability in its classfile (details TBD). At class load time, an error occurs if the class is not abstract, declares an instance field, declares a synchronized method, or implements—directly or indirectly—IdentityObject."

Concretely, the natural fit for encoding is a class attribute ('PrimitiveInstantiation', say) that carries an access flag.

> 3.  T.ref.  In generic code, we can say `T.ref`, which is a         total operator on types; if T is already a reference type, then T.ref = T, and if it is a primitive value type P.val., then T.ref = P.ref.  The Signature attribute should be extended to support the distinction between a use of `T` and a use of `T.ref`.  (T.val is partial, so doesn't make sense in the general case, and in the specific cases where it does make sense, does not currently look worth supporting.)

+1

> 4.  Other flavors, as needed.  We've considered a "null-default" primitive class; if so, this has to be captured in a similar way as (1).  These can probably all be folded into a single PrimitiveClass attribute.  

Currently, the JEP proposes bundling up all default/integrity-related constraints under a single flag:

"Tentative feature: If it is important for correctness, a primitive class may declare that instances must be validated through a constructor call. In this case, the compiler and JVM will ensure that backdoor instance creation is either prevented or detected before any instance methods of the class are executed."

This would affect JVM behavior (e.g., atomicity guarantees). It's binary, so could make sense as an ACC_* flag. Or it could go in an attribute. Or a marker interface.