Finding the spirit of L-World
Brian Goetz
brian.goetz at oracle.com
Fri Feb 22 21:46:34 UTC 2019
Thanks for bringing up .equals, and the possibility to ban == on values,
both of which have been touched on in the past but we've not really
focused much on these. It also helped me clarify why I think there's
really only one answer here.
As to equals(), having == be a substitutability test does not make
“equals()” obsolete — far from it. The existing analogue of == vs
.equals() with reference types holds (pretty much exactly) for values
when == is a substitutibility check.
For example, consider a class like
value class StringWrapper { String s; }
The substitutability test here would be to compute (w, u) -> (w.s ==
u.s); This _is_ the "are they exactly the same value" subst test, and
something we should be able to express. But it is not the
implementation we'd often want for equals -- we'd likely want to
delegate to String::equals:
value class StringWrapper {
String s;
public boolean equals(Object o) { return o instanceof
StringWrapper sw && s.equals(sw.s); }
}
So == (continues to) means "are you exactly the same thing"; .equals()
(continues to) means "are you logically the same thing", as with refs
today. And as with refs, the former is a sensible starting point for
the latter (sometimes its good enough), but we many want to refine it to
allow physically different things to be treated as logically the same.
(The difference is really just a quantitative one; there are just more
values than refs (e.g., Complex) for which `==` is already the right
answer for `equals()`.) So while the substitutibility test is
quantitatively _closer_ to what equals() is likely to be, it's still not
always going to be the same (and its usually going to be simpler.) And
just as we support both now, for reasons, we probably will still want to
do so.
Coming back to our choices, there are four possible interpretations of
==v so far:
- The LW1 interpretation is "== means identity, and values have no
identity, so == always says false"
- The substitutibility interpretation is whether the two operands have
no observable differences
- The third is: upcall to .equals()
- The fourth is: Don't allow it at all.
IMO the first is considerably worse than useless; the user is allowed to
ask a harmless and familiar question, and is guaranteed to get a
surprising answer. If that's the case, don't let them ask at all. (And
you agree, but, see below.)
The second interpretation is a sound generalization of `==` on refs and
primitives; it means "are you exactly the same thing", which can be
given a precise linguistic meaning, and can be further refined with
logical equality tests where desired. The main objection is that it is
more expensive for the VM to implement, and the cost model has a broader
variance. (These are not nothing, I just don't think they trump
intuitive semantics or compatibility.)
If the second interpretation gives VM engineers fits, the third one is
even worse, as it means upcalling to arbitrary Java code from the ACMP
bytecode. It also gives me fits, because it tries to go back 25 years
and rewrite what == means for objects. (And it probably gives you fits,
because you've commented frequently on how often equals()
implementations are wrong.)
The fourth answer, ban it, is surely better than the first, but let's
pull on that string.
Let's say `==` is meaningless on values, so we ban it. But, just as
with the first, we have a problem for code that trucks in Object
(including erased generics). If this code uses == to compare
user-provided values to a user-provided sentinels, this code will /just
stop working//. /And even if they are willing to rewrite it, there's now
no convenient and reliable way to write code that tests "are you the
same thing I saw before".
One of the subtle (but ultimately, good, I think) things about L-world
is that /values are Objects/. That means, if you take an Object
parameter (or a T, for erased generics), someone can pass you a value,
and your code should still work. (If it does still work, we've achieved
(yet again) that elusive form of /forward compatibility/ -- code that
was written before the language had feature X, can deal perfectly well
with X.)
OK, so if we ban == on values, should we ban it on generic code too?
That's the sound choice, since we're quantifying over types for which ==
may not be defined. But that's neither source- nor binary- compatible
with existing code. Which means, at least to keep this code working, we
still have to assign a meaning to == on T when one or both of the
operands are a value. (Of which there are three choices so far,
detailed above.) So for existing sources and binaries, we should give
==T a meaning, otherwise this code breaks. Now, what about Object?
There exists plenty of code which accept Object, and use == on it. So
we have to continue to assign a meaning to ==Object too. Again, we have
three choices. And if we can assign a sound meaning for T== and
Object==, which works when you pass a value in, why not use that for
value== too?
So my claim is: banning it is effectively impossible; we at least have
to pick one of the other intepretations to fall back on for existing
sources and binaries, and if we're going to do that, we should just do
that.
Upleveling.... your concern about "mental database" is a valid one,
and one I've been worried about too. This is why I've been on a
search-and-destroy mission to eliminate gratuitous asymmetries between
values and references as we bring them closer together in the type
system. (I don't want people who write code that trucks in Object, or
erased T, to have to be writing two versions of their code, one for refs
and one for values, or even to be thinking much about the differences.)
On that score (downleveling again):
- The "false" interpretation means that you can ask == of values, but
the question is meaningless. That means, if you are ever to be exposed
to values, you have the following bad choices: give up on discriminating
between values, or do something different for values and refs, or just
use equals() all the time. If these are your only options, that's
pretty terrible.
- The "ban it" interpretation is similar; you don't get to ask the
question, so you're stuck with doing one thing for refs and one for
values, or always using .equals(). It also seems impractical; we will
end up reinventing one of the other solutions for compatibility reasons
only.
- The "call up to Java" interpretation means that the treatment of ==
on refs and values are about as different as you can possibly get!
Again, this means people will end up either using .euals() all the time,
or writing different code for refs and values.
/* - The substitutibility test is the only interpretation that is
consistent with existing understanding and coding idioms, and which will
"just work" when values start getting injected into code that was
compiled years ago that takes Object / erased T and has no conception of
values. It is the only version that doesn't require that people rewrite
their code when values start showing up in your HashMap, or constantly
ask themselves "is this instance a value or a ref." */
The distinction between == and .equals() in Java may have its problems,
but its how Object works, and people have learned idioms that work for
it. Preserving that intuition, and that code, seems to me to be the
highest priority. Option 4 feels to me like a wishful attempt to try
and go back and fix history, which is a worthy goal but we've all
watched enough science fiction to know how that ends.
On 2/22/2019 3:38 PM, Brian Goetz wrote:
>
>> On Feb 22, 2019, at 2:42 PM, Kevin Bourrillion <kevinb at google.com> wrote:
>>
>> Fair point that `==` has always been the test of
>> /absolute/ substitutability. But I think this is overlooking
>> something big: People implement equals() in order to ask for
>> "substitutability for virtually all intents and purposes". Of course,
>> most code should never be going anywhere near identity hash maps or
>> synchronizing on value-like things, etc. And that means that equals()
>> has become the substitutability test that people WANT.
>>
>> This in turn means that every usage of `==` on a non-primitive type
>> (named class) is always suspicious. As a reader and maintainer of
>> code, I need to think about this carefully. Is it a Class<?> -- if so
>> == is harmless but also .equals() is harmless and it's not worth
>> switching idioms. Is it an enum type? I have to go look it up to find
>> out, in which cause it is once again both harmless and pointless
>> (especially if I can replace with switch!). Barring those, then it's
>> either a risky micro-optimization or some other bizarre coding choice
>> that I need to be very careful around.
>>
>> I think we should make users write `equals` to test value types. If
>> they write `==`, they are indicating a special situation where they
>> need identity semantics, which don't make sense for value types, and
>> that should be an error.
>>
>> One of the concerns I've always had about value types is that
>> developers would be forced to maintain a mental database of which
>> types are value types and which are reference types, and that they
>> could not hope to assess the correctness of code they read or write
>> without having that. In a world where users commonly need to do
>> "absolutely substitutable" checks, then this proposal would be the
>> way to achieve that. But, I don't think that's the world we're in.
>>
>> Thoughts?
>>
>>
>>
>> On Thu, Feb 21, 2019 at 9:59 AM Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com>> wrote:
>>
>> More on substitutibility and why this it is desirable...
>>
>> > #### Equality
>> >
>> > Now we need to define equality. The terminology is messy, as
>> so many
>> > of the terms we might want to use (object, value, instance) already
>> > have associations. For now, we'll describe a _substitutability_
>> > predicate on two instances:
>> >
>> > - Two refs are substitutable if they refer to the same object
>> > identity.
>> > - Two primitives are substitutable if they are `==` (modulo
>> special
>> > pleading for `NaN` -- see `Float::equals` and
>> `Double::equals`).
>> > - Two values `a` and `b` are substitutable if they are of
>> the same
>> > type, and for each of the fields `f` of that type, `a.f`
>> and `b.f`
>> > are substitutable.
>> >
>> > We then say that for any two objects, `a == b` iff a and b are
>> > substitutable.
>>
>> Currently, our type system has refs and primitives, and the ==
>> predicate
>> applies on all of them. And for all the types we have today
>> (with the
>> almost-too-small-to-mention anomaly of NaN), == *already is* a
>> substitutibility predicate (where substitutibility means,
>> informally:
>> "no observable difference between the two arguments." Two refs are
>> substitutible if they refer to the same object identity; two
>> primitives
>> are substitutible if they refer to the same value (modulo NaN.)
>>
>> VM engineers like to refer to `==` on refs as "identity
>> equality", but
>> that's really an implementation detail. What it really means is:
>> are
>> the two things the same. And that's what `==` means for
>> primitives too,
>> and that's how the other 99.99% of users think of it too.
>>
>> The natural interpretation of `==` in a world with values is to
>> extend
>> this "are these two things the same" to values too. The
>> substitutibility relation above applies the same "are you the same"
>> logic equally to refs, values, and primitives. No sharp edges
>> (except
>> the NaNsense that we are already stuck with.)
>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>
More information about the valhalla-spec-observers
mailing list