<init> and factories

Thu Oct 17 20:00:54 UTC 2019

On Oct 17, 2019, at 11:22 AM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> The plan of record for compiling the constructors of inline classes is to generate static methods named "<init>" with an appropriate return type, and invoke them with 'invokestatic'.
> 
> This requires relaxing the existing restrictions on method names and references. Historically, the special names "<init>" and "<clinit>" have been reserved for special-purpose JVM rules (for example, 'invokespecial' is treated like a distinct instruction if it invokes a method named '<init>'); for convenience, we've also prohibited all other method names that include the characters '<' or '>' (JVMS 4.2.2).
> 
> Equivalently, we might say that, within the space of method names, we've carved out a reserved space for special purposes: any names that include '<' or '>'.
> 
> A few months ago, I put together a tentative specification that effectively cedes a chunk of the reserved space for general usage [1]. The names "<init>" and "<clinit>" are no longer reserved, *unless* they're paired with descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on the thread, we could even wonder whether the JVM should have a reserved space at all—why can't I name my method "bob>" or "<janet>", for example?
> 
> In retrospect, I'm not sure this direction is such a good idea. There is value in having well-known names that instantly indicate important properties, without having more complex tests. (Complex tests are likely to be a source of bugs and security exploits.) Since the JVM ecosystem is already accustomed to the existence of a reserved space for special method names, we can keep that space for free, while it's potentially costly to give it up.
> 
> So here's a alternative design:
> 
> - "<init>" continues to indicate instance initialization methods; "<clinit>" continues to indicate class initialization methods
> 
> - A new reserved name, "<new>", say, can be used to declare factories
> 
> - To avoid misleading declarations, methods named "<new>" must be static and have a return type that matches their declaring class; only 'invokestatic' instructions can reference them
> 
> - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in reserve, available for special purposes as we discover them
> 
> The Java compiler would only use "<new>" methods for inline class construction, for now; perhaps in the future we'll find other use cases that make sense (like surfacing some sort of factory mechanism).
> 
> Does this seem promising? Any particular reason it's better to overload "<init>" than just come up with a new special name?

For my part either outcome is fine.  The prototype overloads <init> but it could almost as well have added <new>.

Fine points in the VM prototype:

- A method <init> must be static, and it can be restricted to return exactly the type of its declaring class, except in “cases”.
- In some cases (VMACs and hidden classes) the declaring class is not denotable in a descriptor; the return type must be a super (maybe always Object).

So the prototype allows Object as a return type from a static <init> function.  I don’t remember whether it checks that the declaring class is a VMAC in that case.

Would there be any restrictions on the contents of a constructor/factory method <new>?  (I hope not.)

Would there be any enhancements to the capabilities of a <new> function?

For example, I think we should consider allowing <new> to invokespecial super.<init> on a new instance, and/or putstatic into the final fields of the new instance.
If don’t allow this, then translation strategies may have to spin private <init> methods to handle the super call and final field inits, which seems suboptimal to me.
(To be clear:  I’m thinking of using <new> here in a non-inline class.)

One result of using a different name (<new>) is that there’s no need to require that it be static or not.
I don’t think there’s any benefit to requiring that <new> be static.  (Well maybe some:  It partitions <new> from
any kind of virtual call.)  Maybe a non-static <new> could serve as a factory method which takes the current
instance and “reconstructs” it as a new instance.  But that can be done by wrapping a static <new> into some
other method m, and then there’s no confusion about making m virtual.

> [1] http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html

Using something like <new> is a forced move for inline classes.  It is also (IMO) a fruitful move for
regular non-inline (“identity”) classes.  If the translation strategy were adjusted to translate every
new Foo() expression as invokestatic <new>, the following benefits would appear:

- Less reliance on the verifier to validate arbitrary-in-the-wild “new/dup/invokespecial” code shapes.  (It’s been buggy in the past.)
- Simpler more optimizable bytecode for complex expressions like new A(…new B()…), currently a pain point in our JITs.
- A more direct path for migrating “new VT()” expressions from VT as a value-based class to an inline class.  (No migration with new/dup/invokespecial.)
- More compact (and analyzable) classfiles, when they contain new A(…) expressions.
- A future option to make the “new instance” instruction be *private* to the class which it is constructing, a probable security benefit.
- A future option to separate, at the language level, the capability of constructing a subclass instance (super()) from requesting a new object (new A()).

— John

P.S.  About that last option:  A public constructor C allows *both* creation of new instances and subclassing.  It is difficult
to separately control access for these operations.  (They correspond to calls to C.super.<init> and to C.<new>.)
If it were possible tease apart these as separate API points (corresponding to the distinct underlying names) then
they could be given independent access control (one public, one private, etc.).

In fact, a more clear separation would be to call the super-version C.super.<super>.  So that super() calls could
be translated to invokespecial <super> (with the same powers and responsibilities as for <init> in that position).
And new T() calls would be translated to invokestatic <new>.  And <init> would serve both at once, in various
use cases, but a class translation might have only <super> and <new>, or perhaps <super> and <new> and
some private <init> methods to factor out code used by both, locally.

I’ll tip my hand here:  I think of a <new> method as a “final constructor”:  It’s the use of a constructor in the
terminal position, when the requested class is known, and *not* when a random subclass is requesting
initialization of one of its progeny.  I also think of <super> (or <init> used only by subclasses) as an
“abstract constructor”.  It’s the use of a constructor in a non-terminal position, when the requested class
is some subclass elsewhere but it needs to call up the super chain for proper instantiation.  The analogy
with final and abstract methods is not exact, but it is close enough that I think there’s something there.
In this mindset, I think of today’s <init> as a hack which performs both jobs, even though they are distinct,
and of today’s constructor notation as defining *both* the <new> and the <super> methods, and indeed
stashing the one copy of the code on the <init> hack.

When we get bridging technology, we can declaratively spin bridges from non-private <new> and
<super> API points (w/o bodies) into private <init> methods.  So the extra distinctions I’m thinking
of don’t have to end up duplicating bytecodes, in the common case where a class needs to define
parallel <new> and <super> API points.