Factory methods & the language model

Thu Sep 9 16:15:42 UTC 2021

On Thu, Sep 9, 2021 at 10:24 AM Dan Smith <daniel.smith at oracle.com> wrote:
>
> JEP 401 includes special JVM factory methods, spelled <new> (or, alternatively, <init> with a non-void return), which are needed as a standardized way to encode the Java language's primitive class constructors.
>
> We have a lot of flexibility in how much we restrict use of these methods. Too many restrictions seem arbitrary and incoherent from the JVM's point of view; but too few restrictions risk untested corner cases, unfortunate compatibility obligations, and difficulties mapping back to the Java language model.
>
> Expanding on that last one: for tools that operate with a Java language model, there are essentially three strategies for dealing with factory methods outside of the core primitive class construction use case:
>
> 1) Have the JVM reject them

This gives us the maximum flexibility to expand factories in the
future and let's us concentrate on the inline types use cases.  Seems
like a pretty safe fallback position on factories.

> 2) Ignore them

I strongly dislike this.  If javac were to ignore them, and just not
generate them, they are effectively dead code.  It's be much clearer
to users if javac flagged them as such and refused to compile unless
they were deleted.  If javac ignores them, we still need an answer on
what the JVM does with them - reject them?  load them but prevent them
from being invoked?  drop them when loading the classfile?  This seems
like it collapses back to option 1.

> 3) Expand the model to include them

How much expanding does the model need?  We had originally modeled the
<new> factory methods as regular static methods and only gave them the
specialized name to make them easy to detect, to deal with withfield
being limited to the nest,  and to allow reflective operations like
Class::getConstructor() and Class::newInstance() to identify the
inline type "constructors".  Am I forgetting a case?

>
> Taking javac as an example, here's what that looks like:
>
> 1) If factory methods outside of primitive classes are illegal, javac can treat classes with such methods as malformed and report an error.
>
> 2) Or if javac sees a factory method in a non-primitive class, it can just leave it out when it maps the class file to a language-level class. (There's precedent for this in, e.g., the treatment of fields with the same name and different descriptors.)
>
> 3) Or we can allow javac to view factory methods in any class as constructors. A few complications:
>
>     - Constructors of non-final classes have both 'new Foo()' and 'super()' entry points; factories only support the first. So we either need to validate that a matching pair of <new> and <init> exist, or expand the language to model factories independently from constructors.

I don't think we want to touch the "new/dup/<init>" sequence and
trying to allow factories to operate in that delicate dance would be a
mistake.  Factories, beyond the inline types uses, give us a chance to
encapsulate the "new/dup/<init>" dance and present a cleaner model.
We shouldn't attempt to mix the two.

>
>     - The language expects instance creation expressions to create fresh instances. We need to either validate this behavior (does the factory look like "new/dup/<init>"?) or relax the language semantics (perhaps this is in the grey area of mixed binaries?)
>

Only the invokestatic bytecode should be used to invoke a factory.
Classes can have both factories and constructors, but they serve
different purposes and only overlap due to reflective operations.
Keeping them completely separate at the bytecode level is cleanest.

>     - Factories can appear in abstract classes and interfaces. Again, are we willing to change the language model to support these use cases? Perhaps to even allow their declaration?

This makes sense.  Factories are just static methods with a special
name.  A factory on an abstract class or interface makes sense if the
concrete implementations are all package-private (sealed?) so users
only reference the one public abstract class.

>
>     - If a factory method has a mismatched return type (declared in Foo, but returns a Bar), are we willing to support a type system where the type of a factory invocation is not the type of the class to which the factory belongs?
>

I thought we needed this capability for anonymous inline classes as
they can't name themselves in the return type of the factory.  And I
don't see a problem with it as long as we don't touch the new/dup/init
dance.  Is there another problem here I'm not seeing?

> There are probably limits to what we're willing to do with (3), which pushes at least some cases into the (1) or (2) buckets.
>
> So, my question: what should we expect from (3), now and in the foreseeable future? And for the cases that fall outside of it, should we fall back to (1), (2), or a mixture of both?
>

(1), limiting to inline types, is the easiest and safest option while
allowing the most flexibility to change in the future.

For (3), it seems like all the complexity goes away if we don't try to
make factories == constructors at the bytecode level.  Am I missing
something that would force us to do so?

--Dan