[External] : Re: Consolidating the user model

Wed Nov 3 14:05:21 UTC 2021

> I haven't caught up on the plans for equality in a long time.

This is a good time to catch up on this.

Today, the JVM provides an equality operation on objects in the form of 
the `ACMP` instructions.  It also provides per-primitive equality 
operations (`ICMP`, `FCMP`, etc) for the various primitive types. (The 
JVM mostly erases boolean, byte, char, and short to int, so some of 
these instructions are "missing".)

Today, the language translate the `==` operator to the appropriate ACMP 
/ ICMP / etc instruction, depending on the static type of the operands.  
(JLS Ch5 (Contexts and Conversions) does the lifting of managing 
mismatches when we, say, compare an object to a primitive.)  The 
important thing to take away here is that there really are multiple `==` 
operators, they are just spelled the same way, and disambiguated by 
static typing; let's call them `id==`, `int==`, etc if there's any 
ambiguity.  Note that `float==` and `double==` are weird when it comes 
to `NaN`, so `==` on primitives is not necessarily just a straight 
bitwise comparison.

Object has an `equals` method; the default implementation is:

     boolean equals(Object other) {
         return this == other;
     }

So in the absence of code to the contrary, two objects are `equals` if 
they are the same object.

Extrapolating, ACMP is a _substitutability test_; it says that 
substituting one for the other would have no detectable differences.  
Because all objects have a unique identity, comparing the identities is 
both necessary and sufficient for a substitutability test.  This is the 
foundation on which we abstract `==` on the new classes.

If C is a class with no identity, that means an instance is the state, 
the whole state, and nothing but the state.  So the natural way to ask 
"could I substitute instance c1 for instance c2" is to compare each of 
its fields with a substitutability test.  Which is exactly what `ACMP` 
does on primitive objects.  In keeping with the notion that each 
primitive type has its own `==`, we'll write `Point==` for the equality 
on `Point`.

For a simple `Point` primitive class, this is obvious, but it gets 
tricky when a primitive is hiding behind a broader static type like 
Object or an interface type.  Consider:

     primitive class Box {
         Object contents;
     }

How do we compare two boxes?  By comparing their contents.  How do we 
compare contents?  With a substitutability test.  If we have identity 
objects in the box, then the box comparison is "are you both boxes, and 
are your contents `id==`".  What if we have Points in the box?  We need 
to compare them with `Point==`.  How do we know we have Points in the 
box?  By looking at their dynamic type.  So the `==` operation on 
primitive objects not only recurses into fields, but for fields that 
could hold _either_ identity or primitive objects (these are `Object`, 
interfaces, and some abstract classes), we dynamically select the `==` 
operator to use on that field.  (Edge cases: an id object is never `==` 
to a primitive object; null is always `==` to itself.)

Note that `.ref` is transparent here; in order to get a `Point` into the 
`Object` field, we (probably silently) converted it to `Point.ref`.  But 
`Point.ref` uses the same `==` computation as `Point`.  The same is true 
for the B2/B3 distinction; no difference.  Objects without identity are 
equal when their state is equal, whether they're a B2, B3, or B3.ref.

Possibly surprisingly, this has been pushed all the way into `ACMP`.  
This means that existing code like the default implementation of 
`Object::equals` just works; if you give it primitive objects, it knows 
what to do, and performs the proper substitutability test.  One rough 
edge is that we don't use `==` as the test for float and double fields, 
because it's not a proper substitutability test; we use the semantics of 
`Float::equals` and `Double::equals` instead.  Historical wart.

The bottom line is that `==` is preserved as a substitutability test on 
instances of all primitive classes, whether they're "stored" by 
reference or value.  A corollary is that (finally) Integer instances 
provide reliable `==` semantics, rather than the old unreliable 
cache-based semantics.  (One rift healed.)