Factory methods & the language model

Thu Sep 9 19:13:07 UTC 2021

On Thu, Sep 9, 2021 at 2:00 PM Dan Smith <daniel.smith at oracle.com> wrote:
>
> To clarify a bit that I left out: this discussion assumes a pretty fixed JVM feature: a factory method is a static method with a special name, invoked via invokestatic, and possibly subject to certain constraints about the descriptor/enclosing class. I'm not proposing any changes to that basic approach, although choices we make for the Java language & tools _might_ influence the set of constraints we choose to impose in JVMS.
>

Thanks for this clarification.  I was looking at this from a very
bytecode / classfile perspective and missed the language interactions.

* From a JVM perspective, factory methods and constructors are clearly
different things and we should be careful not to mix them.  Factories
are invoked with invokstatic and constructors with invokespecial as
part of the new/dup/init dance.
* From a language perspective, both look very similar.  They both look
like constructors at the source level. (This was the piece I
overlooked)

I was about to write "The major difference (?) is that factories have
an explicit return statement" but I don't think that's true at the
source level.  Can you confirm?

More responses in line.

> > On Sep 9, 2021, at 10:15 AM, Dan Heidinga <heidinga at redhat.com> wrote:
> >
> > On Thu, Sep 9, 2021 at 10:24 AM Dan Smith <daniel.smith at oracle.com> wrote:
> >>
> >> JEP 401 includes special JVM factory methods, spelled <new> (or, alternatively, <init> with a non-void return), which are needed as a standardized way to encode the Java language's primitive class constructors.
> >>
> >> We have a lot of flexibility in how much we restrict use of these methods. Too many restrictions seem arbitrary and incoherent from the JVM's point of view; but too few restrictions risk untested corner cases, unfortunate compatibility obligations, and difficulties mapping back to the Java language model.
> >>
> >> Expanding on that last one: for tools that operate with a Java language model, there are essentially three strategies for dealing with factory methods outside of the core primitive class construction use case:
> >>
> >> 1) Have the JVM reject them
> >
> > This gives us the maximum flexibility to expand factories in the
> > future and let's us concentrate on the inline types use cases.  Seems
> > like a pretty safe fallback position on factories.
>
> Yeah. Seems a little... lacking in vision to impose this restriction on class files of all languages, but it also avoids over-committing.
>

It lets us concentrate on the bits required to ship Valhalla without
needing to interact with the new/dup/init dance.  That could be
lacking in vision or just laser focused on delivery =)

> >
> >> 2) Ignore them
> >
> > I strongly dislike this.  If javac were to ignore them, and just not
> > generate them, they are effectively dead code.
>
> Dead to the Java language and tools, but perhaps a useful way to compile a Scala feature or something?
>

Ok, given the clarification above, "ignore them" means let the JVM
load them and treat them like any other method.  A language that has a
syntax for "invokestatic Foo.<new>()LFoo;" is welcome to call them.
That seems reasonable - I rescind my dislike of this option.

> >  It's be much clearer
> > to users if javac flagged them as such and refused to compile unless
> > they were deleted.  If javac ignores them, we still need an answer on
> > what the JVM does with them - reject them?  load them but prevent them
> > from being invoked?  drop them when loading the classfile?  This seems
> > like it collapses back to option 1.
>
> The JVM semantics are clean and wouldn't change: if you want to use a factory, invoke it with invokestatic. It's just that the Java language wouldn't provide any mechanism to do so (because <new> or <init> aren't legal Java method names).
>
> Ignoring does feel a bit like the feature is incomplete or something, but this sort of behavior does show up from time to time where Java and the JVM aren't perfectly in sync. For example:
> - If there are two fields with the same name, one of them is effectively invisible
> - If there are two methods with the same params and different returns, they're considered overloads that are impossible to disambiguate
> - If there's a stray <clinit> method in an interface (before we outlawed this), javac either filters it out or treats it as a normal method, but anyway you can't call it because of its name
>

I think we're on the same page here now.

> >> 3) Expand the model to include them
> >
> > How much expanding does the model need?  We had originally modeled the
> > <new> factory methods as regular static methods and only gave them the
> > specialized name to make them easy to detect, to deal with withfield
> > being limited to the nest,  and to allow reflective operations like
> > Class::getConstructor() and Class::newInstance() to identify the
> > inline type "constructors".  Am I forgetting a case?
>
> Talking here about expanding the *language* model in some way so that factory methods appearing in non-primitive classes and interfaces can somehow be recognized or invoked. (1) and (2) are reasonable options, too, but here I'm exploring other approaches that go beyond rejecting or ignoring.
>

Ok.  So the question becomes, if the JVM allows <new> methods to be
loaded and invoked in non-primitive classes, how do we expose them to
the java language?

> >> 3) Or we can allow javac to view factory methods in any class as constructors. A few complications:
> >>
> >>    - Constructors of non-final classes have both 'new Foo()' and 'super()' entry points; factories only support the first. So we either need to validate that a matching pair of <new> and <init> exist, or expand the language to model factories independently from constructors.
> >
> > I don't think we want to touch the "new/dup/<init>" sequence and
> > trying to allow factories to operate in that delicate dance would be a
> > mistake.  Factories, beyond the inline types uses, give us a chance to
> > encapsulate the "new/dup/<init>" dance and present a cleaner model.
> > We shouldn't attempt to mix the two.
>
> Not sure which direction you're going here?

I was thinking at the bytecode / classfile level.  I really really
don't want to touch the new/dup/<init> dance.  It's been an unending
source of pain and trying to do anything that interacts with that
sequence is asking for trouble.

>
> One stance we could take: new/dup/<init> is fine for identity classes, we're not going to do anything different.
>
> Another stance we could take: new/dup/<init> is painful, let's try to migrate to a different convention where factory methods encapsulate new/dup/<init>, and clients just call the factory.
>
> I'm saying if we take the latter stance, there's a problem in that constructors would then be compiled down to factory methods *and* (for super calls) <init> methods, and we might need some validation to ensure they are aligned.
>

As much as I'd like to see the language move away from the
new/dup<init> dance, I'm hesitant about trying to tackle that in
Valhalla as it feels like an orthogonal problem to our core mission.
It's one of those threads hanging off the sweater that'll unravel the
whole thing if pulled.

Having a single method compiled to both a factory and an <init> method
feels like a future bug factory ... especially if we have to validate
they stay aligned.  I hesitate to go here as it encourages
bikeshedding but to keep the door open to having both factories and
constructors in identity classes, should we use a different syntax for
factories in primitive classes now?  That way factories would be
"spelled" consistently between primitive and identity classes.  Doing
so diminishes the "codes like a class" story but leaves the door open
for more compatibility in the future.

> >>    - The language expects instance creation expressions to create fresh instances. We need to either validate this behavior (does the factory look like "new/dup/<init>"?) or relax the language semantics (perhaps this is in the grey area of mixed binaries?)
> >>
> >
> > Only the invokestatic bytecode should be used to invoke a factory.
> > Classes can have both factories and constructors, but they serve
> > different purposes and only overlap due to reflective operations.
> > Keeping them completely separate at the bytecode level is cleanest.
>
> Sure, it's nice at the JVM level to treat them as independent features. But that doesn't match the Java language, so there's a mismatch to work out (either by changing the language, or restricting the VM, or having javac ignore code shapes that don't match).
>
> >
> >>    - Factories can appear in abstract classes and interfaces. Again, are we willing to change the language model to support these use cases? Perhaps to even allow their declaration?
> >
> > This makes sense.  Factories are just static methods with a special
> > name.  A factory on an abstract class or interface makes sense if the
> > concrete implementations are all package-private (sealed?) so users
> > only reference the one public abstract class.
>
> Yep, could be a useful feature. Is it one we could actually see implementing? TBD...
>
> >>    - If a factory method has a mismatched return type (declared in Foo, but returns a Bar), are we willing to support a type system where the type of a factory invocation is not the type of the class to which the factory belongs?
> >>
> >
> > I thought we needed this capability for anonymous inline classes as
> > they can't name themselves in the return type of the factory.
>
> We concluded that "need" is too strong a word here. It's a corner case that can be handled without using the factory method feature.
>

That gives up being able to use Class::getConstructor and the MH
equivalents on anonymous classes?

> >  And I
> > don't see a problem with it as long as we don't touch the new/dup/init
> > dance.  Is there another problem here I'm not seeing?
>
> Clients like the Java language will expect the return type to match, and will have to work around the issue if it doesn't (again, with any of these strategies: reject as malformed, ignore, or expand the language to allow it).
>
> Specifically, even if we limit the feature to primitive classes, if a primitive class can have a factory that returns something other than the primitive class's type, javac needs to decide what to do about that.
>
> >> There are probably limits to what we're willing to do with (3), which pushes at least some cases into the (1) or (2) buckets.
> >>
> >> So, my question: what should we expect from (3), now and in the foreseeable future? And for the cases that fall outside of it, should we fall back to (1), (2), or a mixture of both?
> >>
> >
> > (1), limiting to inline types, is the easiest and safest option while
> > allowing the most flexibility to change in the future.
> >
> > For (3), it seems like all the complexity goes away if we don't try to
> > make factories == constructors at the bytecode level.  Am I missing
> > something that would force us to do so?
>
> No, it's not necessarily about JVM bytecode constraints. It's about how javac interprets whatever class files are thrown at it.
>
> But you're right, if we limit the feature in the JVM to the minimal needs of the Java language (in a primitive class, matching return type), we can avoid these issues.
>

Nailing the primitive class use cases has to be our priority.  And
avoiding touching the new/dup/<init> dance comes in as a close second.
If we want to allow identity classes to have both factories and
constructors, we may need a different syntax for factories.

There's a lot of not good options here if we want to keep the door
open for both primitive and identity classes to have a consistent
syntax without upsetting the constructor apple cart.  Yuck.

With our current syntax:
X x = new X();
* primitive class: invokestatic X.<new>()QX;
* identity class: new X/dup/ invokespecial X.<init>()

Unless our X() factory methods have an explicit return statement, we
don't have the syntax to support the abstract class or interface
cases.  There's no way for them to specify what they want to return.
So there goes those targets of opportunity.

To extend this to identity classes is a pretty serious redesign of the
syntax - is this a path we want to go down?  Maybe this is where you
started and I've now caught up =) but design effort on bringing
factories in their current form to identity classes seems like more
effort than it's worth given it won't remove the need for
constructors.

--Dan