Value equality (was: Value types questions and comments)

Tue May 17 03:05:15 UTC 2016

On Mon, May 16, 2016 at 11:20 AM, Brian Goetz <brian.goetz at oracle.com>
wrote:

Kevin say:
>
> However, it would be very sad if it does not solve a second problem at the
>> same time, because it can. Even Java developers who are content with the
>> performance foibles of their existing "value-based classes" are constantly
>> irritated by the burden and bug-prone nature of writing/maintaining such
>> classes.
>>
>
> We have two choices in front of us regarding equality on value types:
>
> 1.  What does `==` on values do?  Choices include a bitwise comparison, a
> componentwise comparision (identical to bitwise unless there are float
> members), or invoking the `equals()` method.
>

I'd recommend that this fail to compile.

When I come across this code:

    a == b

I want to know what that code is doing, and I want to know that it's
correct. So I jump to the definitions of a and b. If I see that they're
ints, I'm done.

If, however, they are SomeType, today I know that I am looking at reference
equality. I'd like that to continue to be the case. People like to know
what they're looking at. I wouldn't like to have to dig into the definition
of *SomeType* (which may be "far away") just to know what sort of equality
I'm looking at.

(Note that even in the case of enums, where reference equality and value
equality are identical, our internal coding guidelines still recommend
always using .equals() for this reason. Then you have code that is not only
correct, but is *obviously* correct. Not code that you have to look up a
foreign definition to know whether it's correct or not. That's useful, and
what's the cost? Just a bit of extra text on the screen. This is Java.)

If we insist that == do something, then I'd recommend it be identical to
equals(). We should assume that the person who wrote equals() made their
choices for a reason. It's impossible to know whether exposing
componentwise equality would make sense for a type or not. Not that I love
this option; it makes me a little squeamish for an operator to invoke user
code. My preference is to not compile.

2.  What is the default for the `equals()` method?  The obvious choice is a
> componentwise comparision, but Kevin has (strongly) suggested we consider a
> deep comparison, calling .equals() on any reference members.  (Note that
> the two choices collapse to the same thing when there are no reference
> members.)
>

Not just equals(), but deepEquals(), as proposed by Bill Pugh.

> For values whose members are value-ish objects themselves (e.g., String,
> Optional, LocalDateTime), the deep-equals is obviously appealing -- a tuple
> of (int, date) makes total sense to compare using .equals() on the date.
> On the other hand, for values whose members are references to mutable
> things (like lists), it's not as obvious what the right default is.
>

When you put it this way it seems even more clear that the "value-ish" case
is the one to design toward; the one that should not require the user to
add custom code.

Of course we know that many different people implement equals() according
to many different goals. But I've become convinced there is only *one* way
to look at equals() that *actually makes sense*, and around which APIs can
and have been and should be designed. And that is to say that *equals means
interchangeable*. 99% of the time, you should never care which of two equal
instances you have. If you do, either you are doing some low-level
nuts-and-bolts far away from business logic (and this should apply to
extremely few people), or you implemented equals() wrong.

Lots of equals() methods *are* implemented wrong and many of those can
never be fixed. However, I think it would be very questionable to design
toward such cases. *Those* should be the cases where you have to write
custom code. The default should be "equals means interchangeable".

Mutable lists are a fuzzy case, but they tend to get away with their
"wrong" behavior because of how often we *treat* them as immutable (er,
meaning, we *don't mutate them*). Still, here we strongly encourage using
ImmutableList as much as possible for reasons like this and all the
well-known others.

So, Kevin -- please make your case!  Please share your experience with
> tools like AutoValue, and if you can, data on how frequently (and why)
> equals() is underridden on auto values in the Google codebase?
>

Okay. To refresh, we have a code generator for "value-based classes"
because they're that much of a pain. As of today, it's used 17,400 times in
our codebase. Its equals() behavior is not configurable; it always uses
deep value equality unless the user replaces it with their own custom
equals() implementation.

I've researched how often Google users are doing this and why. I decided to
manually review *all* instances of this, but at 70% of the way through I'm
getting hungry, so I'll just crudely extrapolate from what I did get to:

   - About 30 classes customize equals so as to ignore some fields. (These
   are a little bit suspicious according to the "equals means interchangeable"
   maxim; I'd probably suggest they redesign a bit and keep their proper
   values separate from their other stuff. In some cases the ignored fields
   are derived, so it's fine, although in some of *those* I consider going
   out of their way to ignore the field in equals as just a hyperoptimization.)
   - About 25 classes do it so that they can substitute some alternate
   determination of equivalence for at least one field. Most of these are
   working around a field's what-I'd-call-broken equals() behavior. None were
   doing this to switch to identity comparison.
   - About 15 are doing very strange and suspicious things and *deserve
   whatever they get*.
   - About 10 seem to be just plain misunderstandings.
   - 1 or 2 customize it so that they can also be equal to other sibling
   types implementing a common interface. Okay.

In all, fully 99.5% of usages accept the default behavior.

So, for evidence that reference equality or even shallow value equality is
what anyone wants, out of 17,400 usages I've come up empty.  (Granted,
there may be users who want it and therefore just don't use AutoValue at
all. But Googlers are very good at complaining when a tool doesn't do quite
what they want, and we're not getting that either.)

HTH

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com