Equality for values -- new analysis, same conclusion
Brian Goetz
brian.goetz at oracle.com
Tue Aug 20 17:14:00 UTC 2019
> We also know that in the future...
So, let's pull on this string, because now we're talking about the right
thing -- what Java do we want to have in the future, even if we can only
take one step now. First, today.
But before that, a digression on terminology. While the terminology is
not nailed down (and please, start a separate thread if you want to
comment on that), the word “value” is problematic, but its hard to break
the habit. For purposes of this mail, a value is _any_ datum that can
be stored in a variable: primitives, object references, and soon,
instances of inline classes. Similarly, the term “object reference” is
problematic, because it is laden with overtones of identity. So, for
purposes of this mail:
- value: any datum
- inline class: what we use to call value classes
- identity class: what we used to call classes
- class: an identity or inline class
- object instance: instance of a class, whether identity or inline
- object reference: a reference to an identity class
A variable of type Object (or interface) may hold _either_ an object
reference, or an instance of an inline class, or null (this is
the confusing new thing). Note that all values are still passed by
value: primitives, object references, and instances of inline classes.
I’ll try to use these consistently, but I’ll likely fail.
Primitives* have a well-defined equivalence relation: do the two
operands describe the exact same value (SAME==). And it is
super-useful. And, it is really the only useful equivalence on
primitives. Conveniently, we have assigned this the operator `==`. No
one argues with this move.
Where things get dodgy is that objects (which historically have always
been described through object references) have TWO well-defined, and
useful, equivalence relations:
- Do the two operands refer to the same object instance (SAME==),
denoted by Object==;
- Are the two objects “equivalent” in the sense defined by their
author, denoted by .equals(). Let’s call this “equivalence”.
Both are useful, so we can’t get rid of either. Identity comparison has
semantic uses (e.g., topology-aware code like IdentityHashMap, or
comparing with sentinels in data structures). It it also used as an
optimization, a faster way to get to equality, and this optimization has
unfortunately outlived its usefulness but not outlived its use.
Obviously equivalence is useful, and in most cases, the more generally
useful of the two, but for better or worse, identity comparison
got custody of the operator `==`. This might have been a questionable
move, but it's what we've got, and we're surely not un-assigning this.
Taking primitives and objects together, despite the very visible seam
between them, the == operator partially heals the seam by working across
all types, and assigning a consistent meaning across all types: SAME==
("are you the exact same thing", where same-ness can incorporate
identity.) Some may feel this was a mistake or an accident of history,
and it might have been, but the outcome has a sense to it: `==` has a
consistent meaning (SAME==) over all data types.
The part that is uncomfortable is that what's been totalized is the less
broadly useful equivalence. We can be aware of this, and try to do
better, but as I’ve observed before, wanting to fix mistakes of history
often leads us into new, worse mistakes, so let’s not fixate on this.
I’ll note at this point (and come back to it later) than just as we have
some control over what `==` means for inline instances, we _also_ have
some control over what `.equals()` means for primitives.
OK, now we are adding inline classes to the mix. Many of these, like
Complex or Point, are like primitives -- they only have one sensible
equality semantics -- do they represent the same number. This is
suitable for binding to ==, or .equals() — or better, both.
But there are also other values which are more complicated, because they
contain potentially-but-not-necessarily-identityful data, like:
inline class Holder { Object o; }
This is the conundrum of L-World. (The irritating part is that these
are the values we are spending all our time talking about, even though
they will not be the most common ones.)
Like with classic objects, for such classes, of the two equivalence
relations ("exactly the same", or "semantically the same"), the former
is generally the less useful. And so, were we rewriting history, we
might bound the "good" syntax to .equals() here too, and relegated the
less useful test to some other uglier API point or operator. But
again, let’s not let this distract us.
In the future, we’ll have primitives, identity objects, and inline
objects, and we’d like not only to not have three things, but we’d like
to not have two things. So we’d like to have a total story for
comparing them all.
Our story for primitives (but please, let’s not get too distracted on
this now), is that primitives can be “boxed” to inline classes, which
will be lighter-weight boxes than our current boxes. And we can lift
members and interfaces from the box to the primitives, so that (say) int
can be seen to implement Comparable and Serializable, and have whatever
methods the lightweight box has — such as equals(). Which means that
equivalence interpretation can be totalized via Object::equals —
primitives, identity objects, and inline objects can all have an
equals() method. And of course, for primitives, equals() and == will be
the same* thing.
So, in the happy future, there will be a total operation that implements
the desirable equality comparison. (Which is important for
specializable generic code, since this operation on a T must be
available on all the types that can instantiate T.)
Or, as you say:
> don't use ==, use equals.
I agree, but here’s the difference in the approaches: we don’t have to
punish == to make it less desirable; we can raise equals() up and make
it more desirable.
But we’re not done with val==. For the same reason that id== is still
useful, if overused, on references, it is useful on values that hold
potential references too. Yes, it is unfortunate that the weaker
claimant (SAME==) got the good syntax. But we still need a way to
denote this operation, and it would be even worse (IMO, far worse) than
the status quo to say “well, we write SAME== for identity objects one
way, but a different way for inline objects, even though you can put
both in an Object." So even given the above, it _still_ seems like a
sensible (if not forced) move to extend the current meaning of == —
SAME== — to the new types. Then everything is total, and everything is
consistent:
- == means “are the two operands the same value" (indistinguishable);
- equals() means “are the two operands semantically equivalent”
and both are total, working on primitives, references, and inline
instances alike. (As mentioned earlier, we can also later —
but absolutely not now — explore whether equals() merits a better syntax.)
Your agenda here (which I agree with) is to lessen the importance of ==.
Where I disagree is that we should do so by making == harder to use.
Instead, I think we should do so by making the better alternatives
easier to use, and educating people about the changed object model and
performance reality.
(I’m still not sure whether exposing V <: Object, rather than V
convertible-to Object, sets the right user model here — but that’s a
separate discussion.)
*Curse you, NaN.
>> So, if you want to make this case, start over, and convince people
>> that Object== is the root problem here.
> Object== is not the root of the problem, Object== becomes a problem
> when we have decided lword, when at the end, every types is a subtype
> of Object, because this is what lworld is. == has been created with ad
> hoc polymorphism in mind (overload polymorphism is a better term BTW),
> let say your are in Java 1.0 time, you have a strong rift between
> objects and primitive types, and no super type in between them, the
> way be able to write polymorphic code is to use overloading, so you
> have println(Object)/println(int)/println(double) etc. But it's not
> enough, so in 1.1 you introduce the wrapper types, Integer, Double
> etc, because you can not write reflection code without being able to
> see a primitive value as an Object. Here, we are doing the opposite,
> since we have decided to use lworld, Object is the root of every
> things, indirect types obviously, inline types too. We also know that
> in the future, we don't want to stay in a 3 kinds of types world. So
> we have to retrofit primitive types to see them as inline types. By
> doing this, we are also saying that every types has now Object has its
> root type. In this brave new world, val== makes little sense, because
> it's introducing a new overload in a world where you have subtyping
> polymorphism so you don't need overload polymorphism anymore. For an
> indirect type, the way to test structural equality is to use equals(),
> if every types is a subtypes of Object, the logical move for me is to
> say, use equals() everywhere and to stop using ==. So having a useful
> val== or a useful Object== goes in the wrong direction, we should
> demote == and look to the future*. Rémi * and it's very intellectually
> satisfactory to have a solution which means that our users will have
> less thing to learn instead of more, i'm thrill that there will be a
> time where my students will be able to use .equals on a primitive types.
More information about the valhalla-spec-observers
mailing list