Finding the spirit of L-World

Fri Feb 22 21:46:34 UTC 2019

Thanks for bringing up .equals, and the possibility to ban == on values, 
both of which have been touched on in the past but we've not really 
focused much on these.  It also helped me clarify why I think there's 
really only one answer here.

As to equals(), having == be a substitutability test does not make 
“equals()” obsolete — far from it.  The existing analogue of == vs 
.equals() with reference types holds (pretty much exactly) for values 
when == is a substitutibility check.

For example, consider a class like

     value class StringWrapper { String s; }

The substitutability test here would be to compute (w, u) -> (w.s == 
u.s);  This _is_ the "are they exactly the same value" subst test, and 
something we should be able to express.   But it is not the 
implementation we'd often want for equals -- we'd likely want to 
delegate to String::equals:

     value class StringWrapper {
         String s;

         public boolean equals(Object o) { return  o instanceof 
StringWrapper sw && s.equals(sw.s); }
     }

So == (continues to) means "are you exactly the same thing"; .equals() 
(continues to) means "are you logically the same thing", as with refs 
today.    And as with refs, the former is a sensible starting point for 
the latter (sometimes its good enough), but we many want to refine it to 
allow physically different things to be treated as logically the same.  
(The difference is really just a quantitative one; there are just more 
values than refs (e.g., Complex) for which `==` is already the right 
answer for `equals()`.)  So while the substitutibility test is 
quantitatively _closer_ to what equals() is likely to be, it's still not 
always going to be the same (and its usually going to be simpler.)  And 
just as we support both now, for reasons, we probably will still want to 
do so.

Coming back to our choices, there are four possible interpretations of 
==v so far:

  - The LW1 interpretation is "== means identity, and values have no 
identity, so == always says false"
  - The substitutibility interpretation is whether the two operands have 
no observable differences
  - The third is: upcall to .equals()
  - The fourth is: Don't allow it at all.

IMO the first is considerably worse than useless; the user is allowed to 
ask a harmless and familiar question, and is guaranteed to get a 
surprising answer.  If that's the case, don't let them ask at all.  (And 
you agree, but, see below.)

The second interpretation is a sound generalization of `==` on refs and 
primitives; it means "are you exactly the same thing", which can be 
given a precise linguistic meaning, and can be further refined with 
logical equality tests where desired.  The main objection is that it is 
more expensive for the VM to implement, and the cost model has a broader 
variance.  (These are not nothing, I just don't think they trump 
intuitive semantics or compatibility.)

If the second interpretation gives VM engineers fits, the third one is 
even worse, as it means upcalling to arbitrary Java code from the ACMP 
bytecode.  It also gives me fits, because it tries to go back 25 years 
and rewrite what == means for objects.  (And it probably gives you fits, 
because you've commented frequently on how often equals() 
implementations are wrong.)

The fourth answer, ban it, is surely better than the first, but let's 
pull on that string.

Let's say `==` is meaningless on values, so we ban it.  But, just as 
with the first, we have a problem for code that trucks in Object 
(including erased generics).  If this code uses == to compare 
user-provided values to a user-provided sentinels, this code will /just 
stop working//. /And even if they are willing to rewrite it, there's now 
no convenient and reliable way to write code that tests "are you the 
same thing I saw before".

One of the subtle (but ultimately, good, I think) things about L-world 
is that /values are Objects/.  That means, if you take an Object 
parameter (or a T, for erased generics), someone can pass you a value, 
and your code should still work.  (If it does still work, we've achieved 
(yet again) that elusive form of /forward compatibility/ -- code that 
was written before the language had feature X, can deal perfectly well 
with X.)

OK, so if we ban == on values, should we ban it on generic code too?  
That's the sound choice, since we're quantifying over types for which == 
may not be defined.  But that's neither source- nor binary- compatible 
with existing code.  Which means, at least to keep this code working, we 
still have to assign a meaning to == on T when one or both of the 
operands are a value.  (Of which there are three choices so far, 
detailed above.)  So for existing sources and binaries, we should give 
==T a meaning, otherwise this code breaks.  Now, what about Object?  
There exists plenty of code which accept Object, and use == on it.  So 
we have to continue to assign a meaning to ==Object too.  Again, we have 
three choices.  And if we can assign a sound meaning for T== and 
Object==, which works when you pass a value in, why not use that for 
value== too?

So my claim is: banning it is effectively impossible; we at least have 
to pick one of the other intepretations to fall back on for existing 
sources and binaries, and if we're going to do that, we should just do 
that.

Upleveling....   your concern about "mental database" is a valid one, 
and one I've been worried about too.  This is why I've been on a 
search-and-destroy mission to eliminate gratuitous asymmetries between 
values and references as we bring them closer together in the type 
system.  (I don't want people who write code that trucks in Object, or 
erased T, to have to be writing two versions of their code, one for refs 
and one for values, or even to be thinking much about the differences.)  
On that score (downleveling again):

  - The "false" interpretation means that you can ask == of values, but 
the question is meaningless.  That means, if you are ever to be exposed 
to values, you have the following bad choices: give up on discriminating 
between values, or do something different for values and refs, or just 
use equals() all the time.  If these are your only options, that's 
pretty terrible.

  - The "ban it" interpretation is similar; you don't get to ask the 
question, so you're stuck with doing one thing for refs and one for 
values, or always using .equals().  It also seems impractical; we will 
end up reinventing one of the other solutions for compatibility reasons 
only.

  - The "call up to Java" interpretation means that the treatment of == 
on refs and values are about as different as you can possibly get!  
Again, this means people will end up either using .euals() all the time, 
or writing different code for refs and values.

/* - The substitutibility test is the only interpretation that is 
consistent with existing understanding and coding idioms, and which will 
"just work" when values start getting injected into code that was 
compiled years ago that takes Object / erased T and has no conception of 
values.  It is the only version that doesn't require that people rewrite 
their code when values start showing up in your HashMap, or constantly 
ask themselves "is this instance a value or a ref." */

The distinction between == and .equals() in Java may have its problems, 
but its how Object works, and people have learned idioms that work for 
it.  Preserving that intuition, and that code, seems to me to be the 
highest priority.  Option 4 feels to me like a wishful attempt to try 
and go back and fix history, which is a worthy goal but we've all 
watched enough science fiction to know how that ends.

On 2/22/2019 3:38 PM, Brian Goetz wrote:
>
>> On Feb 22, 2019, at 2:42 PM, Kevin Bourrillion <kevinb at google.com> wrote:
>>
>> Fair point that `==` has always been the test of 
>> /absolute/ substitutability. But I think this is overlooking 
>> something big: People implement equals() in order to ask for 
>> "substitutability for virtually all intents and purposes". Of course, 
>> most code should never be going anywhere near identity hash maps or 
>> synchronizing on value-like things, etc. And that means that equals() 
>> has become the substitutability test that people WANT.
>>
>> This in turn means that every usage of `==` on a non-primitive type 
>> (named class) is always suspicious. As a reader and maintainer of 
>> code, I need to think about this carefully. Is it a Class<?> -- if so 
>> == is harmless but also .equals() is harmless and it's not worth 
>> switching idioms. Is it an enum type? I have to go look it up to find 
>> out, in which cause it is once again both harmless and pointless 
>> (especially if I can replace with switch!). Barring those, then it's 
>> either a risky micro-optimization or some other bizarre coding choice 
>> that I need to be very careful around.
>>
>> I think we should make users write `equals` to test value types. If 
>> they write `==`, they are indicating a special situation where they 
>> need identity semantics, which don't make sense for value types, and 
>> that should be an error.
>>
>> One of the concerns I've always had about value types is that 
>> developers would be forced to maintain a mental database of which 
>> types are value types and which are reference types, and that they 
>> could not hope to assess the correctness of code they read or write 
>> without having that. In a world where users commonly need to do 
>> "absolutely substitutable" checks, then this proposal would be the 
>> way to achieve that. But, I don't think that's the world we're in.
>>
>> Thoughts?
>>
>>
>>
>> On Thu, Feb 21, 2019 at 9:59 AM Brian Goetz <brian.goetz at oracle.com 
>> <mailto:brian.goetz at oracle.com>> wrote:
>>
>>     More on substitutibility and why this it is desirable...
>>
>>     > #### Equality
>>     >
>>     > Now we need to define equality.  The terminology is messy, as
>>     so many
>>     > of the terms we might want to use (object, value, instance) already
>>     > have associations. For now, we'll describe a _substitutability_
>>     > predicate on two instances:
>>     >
>>     >    - Two refs are substitutable if they refer to the same object
>>     >      identity.
>>     >    - Two primitives are substitutable if they are `==` (modulo
>>     special
>>     >      pleading for `NaN` -- see `Float::equals` and
>>     `Double::equals`).
>>     >    - Two values `a` and `b` are substitutable if they are of
>>     the same
>>     >      type, and for each of the fields `f` of that type, `a.f`
>>     and `b.f`
>>     >      are substitutable.
>>     >
>>     > We then say that for any two objects, `a == b` iff a and b are
>>     > substitutable.
>>
>>     Currently, our type system has refs and primitives, and the ==
>>     predicate
>>     applies on all of them.  And for all the types we have today
>>     (with the
>>     almost-too-small-to-mention anomaly of NaN), == *already is* a
>>     substitutibility predicate (where substitutibility means,
>>     informally:
>>     "no observable difference between the two arguments."  Two refs are
>>     substitutible if they refer to the same object identity; two
>>     primitives
>>     are substitutible if they refer to the same value (modulo NaN.)
>>
>>     VM engineers like to refer to `==` on refs as "identity
>>     equality", but
>>     that's really an implementation detail.  What it really means is:
>>     are
>>     the two things the same.  And that's what `==` means for
>>     primitives too,
>>     and that's how the other 99.99% of users think of it too.
>>
>>     The natural interpretation of `==` in a world with values is to
>>     extend
>>     this "are these two things the same" to values too.  The
>>     substitutibility relation above applies the same "are you the same"
>>     logic equally to refs, values, and primitives.  No sharp edges
>>     (except
>>     the NaNsense that we are already stuck with.)
>>
>>
>>
>>
>> -- 
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com 
>> <mailto:kevinb at google.com>
>