Value equality (was: Value types questions and comments)

Kevin Bourrillion kevinb at google.com
Tue May 17 03:29:37 UTC 2016


And by "deep equals" I mean equivalent behavior to

https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html#deepEquals-java.lang.Object:A-java.lang.Object:A-

at least that is what AutoValue has been doing.


On Mon, May 16, 2016 at 8:05 PM, Kevin Bourrillion <kevinb at google.com>
wrote:

> On Mon, May 16, 2016 at 11:20 AM, Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
> Kevin say:
>>
>> However, it would be very sad if it does not solve a second problem at
>>> the same time, because it can. Even Java developers who are content with
>>> the performance foibles of their existing "value-based classes" are
>>> constantly irritated by the burden and bug-prone nature of
>>> writing/maintaining such classes.
>>>
>>
>> We have two choices in front of us regarding equality on value types:
>>
>> 1.  What does `==` on values do?  Choices include a bitwise comparison, a
>> componentwise comparision (identical to bitwise unless there are float
>> members), or invoking the `equals()` method.
>>
>
> I'd recommend that this fail to compile.
>
> When I come across this code:
>
>     a == b
>
> I want to know what that code is doing, and I want to know that it's
> correct. So I jump to the definitions of a and b. If I see that they're
> ints, I'm done.
>
> If, however, they are SomeType, today I know that I am looking at
> reference equality. I'd like that to continue to be the case. People like
> to know what they're looking at. I wouldn't like to have to dig into the
> definition of *SomeType* (which may be "far away") just to know what sort
> of equality I'm looking at.
>
> (Note that even in the case of enums, where reference equality and value
> equality are identical, our internal coding guidelines still recommend
> always using .equals() for this reason. Then you have code that is not only
> correct, but is *obviously* correct. Not code that you have to look up a
> foreign definition to know whether it's correct or not. That's useful, and
> what's the cost? Just a bit of extra text on the screen. This is Java.)
>
> If we insist that == do something, then I'd recommend it be identical to
> equals(). We should assume that the person who wrote equals() made their
> choices for a reason. It's impossible to know whether exposing
> componentwise equality would make sense for a type or not. Not that I love
> this option; it makes me a little squeamish for an operator to invoke user
> code. My preference is to not compile.
>
>
> 2.  What is the default for the `equals()` method?  The obvious choice is
>> a componentwise comparision, but Kevin has (strongly) suggested we consider
>> a deep comparison, calling .equals() on any reference members.  (Note that
>> the two choices collapse to the same thing when there are no reference
>> members.)
>>
>
> Not just equals(), but deepEquals(), as proposed by Bill Pugh.
>
>
>
>> For values whose members are value-ish objects themselves (e.g., String,
>> Optional, LocalDateTime), the deep-equals is obviously appealing -- a tuple
>> of (int, date) makes total sense to compare using .equals() on the date.
>> On the other hand, for values whose members are references to mutable
>> things (like lists), it's not as obvious what the right default is.
>>
>
> When you put it this way it seems even more clear that the "value-ish"
> case is the one to design toward; the one that should not require the user
> to add custom code.
>
> Of course we know that many different people implement equals() according
> to many different goals. But I've become convinced there is only *one*
> way to look at equals() that *actually makes sense*, and around which
> APIs can and have been and should be designed. And that is to say that *equals
> means interchangeable*. 99% of the time, you should never care which of
> two equal instances you have. If you do, either you are doing some
> low-level nuts-and-bolts far away from business logic (and this should
> apply to extremely few people), or you implemented equals() wrong.
>
> Lots of equals() methods *are* implemented wrong and many of those can
> never be fixed. However, I think it would be very questionable to design
> toward such cases. *Those* should be the cases where you have to write
> custom code. The default should be "equals means interchangeable".
>
> Mutable lists are a fuzzy case, but they tend to get away with their
> "wrong" behavior because of how often we *treat* them as immutable (er,
> meaning, we *don't mutate them*). Still, here we strongly encourage using
> ImmutableList as much as possible for reasons like this and all the
> well-known others.
>
>
> So, Kevin -- please make your case!  Please share your experience with
>> tools like AutoValue, and if you can, data on how frequently (and why)
>> equals() is underridden on auto values in the Google codebase?
>>
>
> Okay. To refresh, we have a code generator for "value-based classes"
> because they're that much of a pain. As of today, it's used 17,400 times in
> our codebase. Its equals() behavior is not configurable; it always uses
> deep value equality unless the user replaces it with their own custom
> equals() implementation.
>
> I've researched how often Google users are doing this and why. I decided
> to manually review *all* instances of this, but at 70% of the way through
> I'm getting hungry, so I'll just crudely extrapolate from what I did get to:
>
>    - About 30 classes customize equals so as to ignore some fields.
>    (These are a little bit suspicious according to the "equals means
>    interchangeable" maxim; I'd probably suggest they redesign a bit and keep
>    their proper values separate from their other stuff. In some cases the
>    ignored fields are derived, so it's fine, although in some of *those* I
>    consider going out of their way to ignore the field in equals as just a
>    hyperoptimization.)
>    - About 25 classes do it so that they can substitute some alternate
>    determination of equivalence for at least one field. Most of these are
>    working around a field's what-I'd-call-broken equals() behavior. None were
>    doing this to switch to identity comparison.
>    - About 15 are doing very strange and suspicious things and *deserve
>    whatever they get*.
>    - About 10 seem to be just plain misunderstandings.
>    - 1 or 2 customize it so that they can also be equal to other sibling
>    types implementing a common interface. Okay.
>
> In all, fully 99.5% of usages accept the default behavior.
>
> So, for evidence that reference equality or even shallow value equality is
> what anyone wants, out of 17,400 usages I've come up empty.  (Granted,
> there may be users who want it and therefore just don't use AutoValue at
> all. But Googlers are very good at complaining when a tool doesn't do quite
> what they want, and we're not getting that either.)
>
> HTH
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>



-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com


More information about the valhalla-spec-observers mailing list