Alternative to IdentityObject & ValueObject interfaces

Tue Mar 22 23:56:02 UTC 2022

In response to some encouragement from Remi, John, and others, I've decided to take a closer look at how we might approach the categorization of value and identity classes without relying on the IdentityObject and ValueObject interfaces.

(For background, see the thread "The interfaces IdentityObject and ValueObject must die" in January.)

These interfaces have found a number of different uses (enumerated below), while mostly leaning on the existing functionality of interfaces, so there's a pretty good complexity vs. benefit trade-off. But their use has some rough edges, and inserting them everywhere has a nontrivial compatibility impact. Can we do better?

Language proposal:

- A "value class" is any class whose instances are all value objects. An "identity class" is any class whose instances are all identity objects. Abstract classes can be value classes or identity classes, or neither. Interfaces can be "value interfaces" or "identity interfaces", or neither.

- A class/interface can be designated a value class with the 'value' modifier.

value class Foo {}
abstract value class Bar {}
value interface Baz {}
value record Rec(int x) {}

A class/interface can be designated an identity class with the 'identity' modifier.

identity class Foo {}
abstract identity class Bar {}
identity interface Baz {}
identity record Rec(int x) {}

- Concrete classes with neither modifier are implicitly 'identity'; abstract classes with neither modifier, but with certain identity-dependent features (instance fields, initializers, synchronized methods, ...) are implicitly 'identity' (possibly with a warning). Other abstract classes and interfaces are fine being neither (thus supporting both kinds of subclasses).

- The properties are inherited: if you extend a value class/interface, you are a value/class interface. (Same for identity classes/interfaces.) It's an error to be both.

- The usual restrictions apply to value classes, both concrete and abstract; and also to "neither" abstract classes, if they haven't been implicitly made 'identity'.

- An API ('Object.isValueObject()'?) allows for dynamically distinguishing between value objects and identity objects. The reflection API (in java.lang.Class) allows for detection of value classes/interfaces, identity classes/interfaces, and "neither" classes/interfaces.

- TBD whether/how we track these properties statically so that the type system catch mismatches between non-identity class types and uses that assume identity.

JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are not. Optionally, modern-version concrete classes are also implicitly ACC_IDENTITY.

(Trying out this alternative approach to abstract classes: there's no more ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically ACC_IDENTITY, and modern-version abstract classes permit value subclasses unless they opt out with ACC_IDENTITY. It's the bytecode generator's responsibility to set these flags appropriately. Conceptually cleaner, maybe too risky...)

- At class load time, we inherit value/identity-ness and check for conflicts. It's okay to have neither flag set but inherit the property from one of your supers. We also enforce constraints on value classes and "neither" abstract classes.

---

So how does this score as a replacement for the list of features enabled by the interfaces?

- Dynamic detection: 'obj instanceof ValueObject' is quite straightforward; if we can replace that with 'obj.isValueObject()', that feels about equally useful. (I'd be more pessimistic about something like 'Objects.isValueObject(obj)'.)

- Subclass restriction: 'implements IdentityObject' has been replaced with the 'identity' modifier. Complexity cost of special modifiers seems on par with the complexity of special rules for inferring and checking the superinterfaces. I think it's a win that we use the 'value' modifier and "value" terminology for all kinds of classes/interfaces, not just concrete classes.

- Variable types: I don't see a good way to get the equivalent of an 'IdentityObject' type. It would involve tracking the 'identity' property through the whole type system, which seems like a huge burden for the occasional "I'm not sure you can lock on that" error message. So we'd probably need to be okay letting that go. Fortunately, I'm not sure it's a great loss—lots of code today seems happy using 'Object' when it means, informally, "object that I've created for the sole purpose of locking".

- Type variable bounds: this one seems more achievable, by using the 'value' and 'identity' keywords to indicate a new kind of bounds check ('<identity T extends Runnable>'). Again, it's added complexity, but it's more localized. We should think more about the use cases, and decide if it passes the cost/benefit analysis. If not, nothing else depends on this, so it could be dropped. (Or left to a future, more general feature?)

- Documentation: we've lost the handy javadoc location to put some explanations about identity & value objects in a place that curious programmers can easily stumble on. Anything we want to say needs to go in JLS/JVMS (or perhaps the java.lang.Object javadoc).

- Compatibility: pretty clear win here. No interface injection means tools that depend on reflection results won't be broken. (We've found a significant number of these problems in our own code/tests, FWIW.) No new static types means inference results won't change. There's less risk of incompatibilities when adding/removing the 'identity' and 'value' keywords (although there can still be source, binary, and behavioral incompatibilities).