Alternative to IdentityObject & ValueObject interfaces
Dan Smith
daniel.smith at oracle.com
Fri Apr 1 17:02:38 UTC 2022
On Mar 22, 2022, at 10:52 PM, Dan Smith <daniel.smith at oracle.com<mailto:daniel.smith at oracle.com>> wrote:
On Mar 22, 2022, at 7:21 PM, Dan Heidinga <heidinga at redhat.com<mailto:heidinga at redhat.com>> wrote:
A couple of comments on the encoding and questions related to descriptors.
JVM proposal:
- Same conceptual framework.
- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.
- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are not. Optionally, modern-version concrete classes are also implicitly ACC_IDENTITY.
Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.
(Trying out this alternative approach to abstract classes: there's no more ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically ACC_IDENTITY, and modern-version abstract classes permit value subclasses unless they opt out with ACC_IDENTITY. It's the bytecode generator's responsibility to set these flags appropriately. Conceptually cleaner, maybe too risky...)
With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes. This is kind of growing on
me.
A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. Abstract classes and interfaces have to get two different behaviors based on the same 0 bits.
Here's another more stable encoding, though, that feels less fiddly to me than what I originally wrote:
ACC_VALUE means "allows value object instances"
ACC_IDENTITY means "allows identity object instances"
If you set *both*, you're a "neither" class/interface. (That is, you allow both kinds of instances.)
If you set *none*, you get the default/legacy behavior implicitly: classes are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.
Update on encoding: after some internal discussion, I've found this to be the most natural fit:
- ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files
- ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword in source files
- If neither is set, the class/interface supports both kinds of subclasses (and must be abstract)
- If both are set, or any supers' flags conflict, it's an error
- In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set
What about newer-version classes that use old encodings? (E.g., a tool bumps its output version number but isn't aware of these flags.) There's a sneaky trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, which no longer has any effect and that we've encouraged to be set since Java 1.0.2. So if you're already setting ACC_SUPER in your classes, you've automatically opted in to ACC_IDENTITY; doing something different requires making changes to the generated code.
So the remaining incompatibility risk is that someone generates a class (not an interface) with a newer version number and with neither flag set (violating the "always set ACC_SUPER" advice), and then either the class won't load (it's concrete, it declares an instance field, etc.), or it's abstract and accidentally supports value subclasses, and so can be instantiated without running <init> logic. The number of unlikely events in this scenario seem like enough for us not to be concerned.
More information about the valhalla-spec-observers
mailing list