Equality for values -- new analysis, same conclusion

Tue Aug 20 17:14:00 UTC 2019

> We also know that in the future...

So, let's pull on this string, because now we're talking about the right 
thing -- what Java do we want to have in the future, even if we can only 
take one step now.  First, today.

But before that, a digression on terminology.  While the terminology is 
not nailed down (and please, start a separate thread if you want to 
comment on that), the word “value” is problematic, but its hard to break 
the habit.  For purposes of this mail, a value is _any_ datum that can 
be stored in a variable: primitives, object references, and soon, 
instances of inline classes.  Similarly, the term “object reference” is 
problematic, because it is laden with overtones of identity.  So, for 
purposes of this mail:

  - value: any datum
  - inline class: what we use to call value classes
  - identity class: what we used to call classes
  - class: an identity or inline class
  - object instance: instance of a class, whether identity or inline
  - object reference: a reference to an identity class

A variable of type Object (or interface) may hold _either_ an object 
reference, or an instance of an inline class, or null (this is 
the confusing new thing). Note that all values are still passed by 
value: primitives, object references, and instances of inline classes. 
I’ll try to use these consistently, but I’ll likely fail.

Primitives* have a well-defined equivalence relation: do the two 
operands describe the exact same value (SAME==).  And it is 
super-useful.  And, it is really the only useful equivalence on 
primitives.  Conveniently, we have assigned this the operator `==`.  No 
one argues with this move.

Where things get dodgy is that objects (which historically have always 
been described through object references) have TWO well-defined, and 
useful, equivalence relations:

  - Do the two operands refer to the same object instance (SAME==), 
denoted by Object==;
  - Are the two objects “equivalent” in the sense defined by their 
author, denoted by .equals().  Let’s call this “equivalence”.

Both are useful, so we can’t get rid of either.  Identity comparison has 
semantic uses (e.g., topology-aware code like IdentityHashMap, or 
comparing with sentinels in data structures).  It it also used as an 
optimization, a faster way to get to equality, and this optimization has 
unfortunately outlived its usefulness but not outlived its use.

Obviously equivalence is useful, and in most cases, the more generally 
useful of the two, but for better or worse, identity comparison 
got custody of the operator `==`. This might have been a questionable 
move, but it's what we've got, and we're surely not un-assigning this.

Taking primitives and objects together, despite the very visible seam 
between them, the == operator partially heals the seam by working across 
all types, and assigning a consistent meaning across all types: SAME== 
("are you the exact same thing", where same-ness can incorporate 
identity.)  Some may feel this was a mistake or an accident of history, 
and it might have been, but the outcome has a sense to it: `==` has a 
consistent meaning (SAME==) over all data types.

The part that is uncomfortable is that what's been totalized is the less 
broadly useful equivalence.  We can be aware of this, and try to do 
better, but as I’ve observed before, wanting to fix mistakes of history 
often leads us into new, worse mistakes, so let’s not fixate on this.

I’ll note at this point (and come back to it later) than just as we have 
some control over what `==` means for inline instances, we _also_ have 
some control over what `.equals()` means for primitives.

OK, now we are adding inline classes to the mix.  Many of these, like 
Complex or Point, are like primitives -- they only have one sensible 
equality semantics -- do they represent the same number.  This is 
suitable for binding to ==, or .equals() — or better, both.

But there are also other values which are more complicated, because they 
contain potentially-but-not-necessarily-identityful data, like:

     inline class Holder { Object o; }

This is the conundrum of L-World.  (The irritating part is that these 
are the values we are spending all our time talking about, even though 
they will not be the most common ones.)

Like with classic objects, for such classes, of the two equivalence 
relations ("exactly the same", or "semantically the same"), the former 
is generally the less useful.  And so, were we rewriting history, we 
might bound the "good" syntax to .equals() here too, and relegated the 
less useful test to some other uglier API point or operator.  But 
again, let’s not let this distract us.

In the future, we’ll have primitives, identity objects, and inline 
objects, and we’d like not only to not have three things, but we’d like 
to not have two things.  So we’d like to have a total story for 
comparing them all.

Our story for primitives (but please, let’s not get too distracted on 
this now), is that primitives can be “boxed” to inline classes, which 
will be lighter-weight boxes than our current boxes.  And we can lift 
members and interfaces from the box to the primitives, so that (say) int 
can be seen to implement Comparable and Serializable, and have whatever 
methods the lightweight box has — such as equals().  Which means that 
equivalence interpretation can be totalized via Object::equals — 
primitives, identity objects, and inline objects can all have an 
equals() method.  And of course, for primitives, equals() and == will be 
the same* thing.

So, in the happy future, there will be a total operation that implements 
the desirable equality comparison.  (Which is important for 
specializable generic code, since this operation on a T must be 
available on all the types that can instantiate T.)

Or, as you say:

> don't use ==, use equals.

I agree, but here’s the difference in the approaches: we don’t have to 
punish == to make it less desirable; we can raise equals() up and make 
it more desirable.

But we’re not done with val==.  For the same reason that id== is still 
useful, if overused, on references, it is useful on values that hold 
potential references too.  Yes, it is unfortunate that the weaker 
claimant (SAME==) got the good syntax.  But we still need a way to 
denote this operation, and it would be even worse (IMO, far worse) than 
the status quo to say “well, we write SAME== for identity objects one 
way, but a different way for inline objects, even though you can put 
both in an Object."  So even given the above, it _still_ seems like a 
sensible (if not forced) move to extend the current meaning of == — 
SAME== — to the new types.  Then everything is total, and everything is 
consistent:

   - == means “are the two operands the same value" (indistinguishable);
   - equals() means “are the two operands semantically equivalent”

and both are total, working on primitives, references, and inline 
instances alike.  (As mentioned earlier, we can also later — 
but absolutely not now — explore whether equals() merits a better syntax.)

Your agenda here (which I agree with) is to lessen the importance of ==. 
  Where I disagree is that we should do so by making == harder to use. 
  Instead, I think we should do so by making the better alternatives 
easier to use, and educating people about the changed object model and 
performance reality.

(I’m still not sure whether exposing V <: Object, rather than V 
convertible-to Object, sets the right user model here — but that’s a 
separate discussion.)

*Curse you, NaN.

>> So, if you want to make this case, start over, and convince people 
>> that Object== is the root problem here.
> Object== is not the root of the problem, Object== becomes a problem 
> when we have decided lword, when at the end, every types is a subtype 
> of Object, because this is what lworld is. == has been created with ad 
> hoc polymorphism in mind (overload polymorphism is a better term BTW), 
> let say your are in Java 1.0 time, you have a strong rift between 
> objects and primitive types, and no super type in between them, the 
> way be able to write polymorphic code is to use overloading, so you 
> have println(Object)/println(int)/println(double) etc. But it's not 
> enough, so in 1.1 you introduce the wrapper types, Integer, Double 
> etc, because you can not write reflection code without being able to 
> see a primitive value as an Object. Here, we are doing the opposite, 
> since we have decided to use lworld, Object is the root of every 
> things, indirect types obviously, inline types too. We also know that 
> in the future, we don't want to stay in a 3 kinds of types world. So 
> we have to retrofit primitive types to see them as inline types. By 
> doing this, we are also saying that every types has now Object has its 
> root type. In this brave new world, val== makes little sense, because 
> it's introducing a new overload in a world where you have subtyping 
> polymorphism so you don't need overload polymorphism anymore. For an 
> indirect type, the way to test structural equality is to use equals(), 
> if every types is a subtypes of Object, the logical move for me is to 
> say, use equals() everywhere and to stop using ==. So having a useful 
> val== or a useful Object== goes in the wrong direction, we should 
> demote == and look to the future*. Rémi * and it's very intellectually 
> satisfactory to have a solution which means that our users will have 
> less thing to learn instead of more, i'm thrill that there will be a 
> time where my students will be able to use .equals on a primitive types.