Evolving instance creation

Dan Smith daniel.smith at oracle.com
Tue Feb 22 21:16:54 UTC 2022


One of the longstanding properties of class instance creation expressions ('new Foo()') is that the instance being produced is unique—that is, not '==' to any previously-created instance.

Value classes will disrupt this invariant, because it's possible to "create" an instance of a value class that already exists:

new Point(1, 2) == new Point(1, 2) // always true

A related, possibly-overlapping new Java feature idea (not concretely proposed, but something the language might want in the future) is the declaration of canonical factory methods in a class, which intentionally *don't* promise unique instances (for example, they might implement interning). These factories would be like constructors in that they wouldn't have a unique method name, but otherwise would behave like ad hoc static factory methods—take some arguments, use them to create/locate an appropriate instance, return it.

I want to focus here on the usage of class instance creation expressions, and how to approach changes to their semantics. This involves balancing the needs of programmers who depend on the unique instance invariant with those who don't care and would prefer fewer knobs/less complexity.

Here are three approaches that I could imagine pursuing:

(1) Value classes are a special case for 'new Foo()'

This is the plan of record: the unique instance invariant continues to hold for 'new Foo()' where Foo is an identity class, but if Foo is a value class, you might get an existing instance.

In bytecode, the translation of 'new Foo()' depends on the kind of class (as determined at compile time). Identity class creation continues to be implemented via 'new Foo; dup; invokespecial Foo.<init>()V'. Value class creation occurs via 'invokestatic Foo.<newvalue>()LFoo;' (method name bikeshedding tk). There is no compatibility between the two (e.g., if an identity class becomes a value class).

In a way, it shouldn't be surprising that a value class doesn't guarantee unique instances, because uniqueness is closely tied to identity. So special-casing 'new Foo()' isn't that different from special-casing Object.equals'—in the absence of identity, we'll do something reasonable, but not quite the same.

Factories don't enter into this story at all. If we end up having unnamed factories in the future, they will be declared and invoked with a separate syntax, and will be declarable both by identity classes and value classes. (Value class factories don't seem particularly compelling, but they could, say, be used to smooth migration, like 'Integer.valueOf'.)

Biggest concerns: for now, it can be surprising that 'new' doesn't always give you a unique instance. In a future with factories, navigating between the 'new' syntax and the factory invocation syntax may be burdensome, with style wars about which approach is better.

(2) 'new Foo()' as a general-purpose creation tool

In this approach, 'new Foo()' is the use-site syntax for *both* factory and constructor invocation. Factories and constructors live in the same overload resolution "namespace", and all will be considered by the use site.

In bytecode, the preferred translation of 'new Foo()' is 'invokestatic Foo.<new>()LFoo;'. Note that this is the case for both value classes *and identity classes*. For compatibility, 'new/dup/<init>' also needs to be supported for now; eventually, it might be deprecated. Refactoring between constructors and factories is generally compatible.

Because this re-interpretation of 'new Foo()' supports factories, there is no unique instance invariant. At best, particular classes can document that they produce unique instances, and clients who need this behavior should ensure they're working with classes that promise it. (It's not as simple as looking for a *current* factory, because constructors can be refactored to factories.)

For developers who don't care about unique instances, this is the simplest approach: whenever you want an instance of Foo, you say 'new Foo()'.

Biggest concerns: we've demoted an ironclad semantic guarantee to an optional property of some classes. For those developers/use cases who care about the unique instance invariant, that may be difficult, especially because we're undoing a longstanding property rather than designing it this way from the beginning.

(3) 'new Foo()' for unique instances and just 'Foo()' otherwise

Here, the 'new' keyword is reserved for cases in which a unique instance is guaranteed. For value class creation, factory invocation, and constructor invocation when unique instances don't matter, a bare 'Foo()' call is used instead. 'new Point()' would be an error—this syntax doesn't work with value classes.

In bytecode, 'new Foo()' always compiles to 'new/dup/<init>', while plain 'Foo()' typically compiles to 'invokestatic Foo.<make>()LFoo;' (method name bikeshedding tk). For compatibility, plain 'Foo()' would support 'new/dup/<init>' invocations as well, if that's all the class provides. Refactoring between constructors and factories is generally compatible for plain 'Foo()' use sites, but not 'new Foo()' use sites.

The plain 'Foo()' would become the preferred style for general-purpose usage, while 'new Foo()' would (eventually, after a long migration period) signal an interest in the unique instance guarantee. Java code written with the updated style is a little lighter on "ceremony".

Biggest concerns: a somewhat arbitrary shift in coding style for all programmers to learn, which at a minimum must be adopted when working with value classes.

---

What are your thoughts about the significance of the unique instance invariant? Is it important enough to design instance creation syntax around it? Do either (2) or (3) above sound like a better destination than the plan of record?


More information about the valhalla-spec-observers mailing list