Feedback regarding default equals, hashCode and toString implementions for records

Mon Apr 27 14:52:15 UTC 2020

> Hi all,
>
> While experimenting with the new records, I found what I believe is an
> oversight in the implementation.
> The generated equals method for records is based on equality of the
> components, but when the record component is an array, it still uses
> regular Object#equals(Object) method.

Thanks for your thoughts.

Unfortunately, this was no oversight -- this was an extensively 
discussed topic, and none of the options were terribly satisfying. In 
the end, we concluded that doing what you suggest would cause more 
trouble than it was worth, and went back to the uniform appeal to 
Object::equals.

Yes, arrays are irritating; they cannot be made immutable, and you 
cannot override their equals computation.  Many developers believe that 
the default equals computation is "obviously wrong", but unfortunately, 
it is only "wrong" a lot of the time.  The rest of the time, the "other 
default" would be "wrong".  This is a no-win situation.

We considered doing exactly as you say -- using some sort of deeper 
equals when the static type is an array.  But, this "fix" creates a 
number of inconsistencies, which quickly turn into whack-a-mole, with 
extra complexity as a bonus.  As an example, suppose you had the 
following two records:

     record A(String[] ss) { }
     record B(Object ss) { }

     String[] ss = ...

Then the equals() behavior of `new A(ss)` and `new B(ss)` would be 
different!  Try explaining that.

By-reference comparison and mutability go hand in hand; if an object is 
mutable, then identity equality is probably the right thing. (Mutable 
objects whose equals/hashCode are state-based are a perennial sorts of 
puzzlers, when you put one in a map, and then the object is mutated, and 
then (maybe) it can't be found in the map even though it is there.  
(Ironically, List itself is one of the offenders here.))  Arrays make 
this choice, for better or worse (but at least make it consistently.)  
But arrays are not the only ones who make this choice, and the propose 
fix does nothing for them.

> I believe that in order for records to succeed, they need to be truly
> boilerplate-reducing, so I'm suggesting the following changes to the
> default equals, hashCode and toString implementations:
>
> IF the field in the record is statically known to be a one-dimensional
> array:
>    THEN java.util.Arrays.equals(array1, array2) is used to compare the
> component of two record instances     (applies to
> <recordclass>#equals(Object))
>    AND java.util.Arrays.hashCode(array) is used to compute the hashcode of
> the component                     (applies to  <recordclass>#hashCode())
>    AND java.util.Arrays.toString(array) is used to compute the string
> representation of the component        (applies to
> <recordclass>#toString())
> ELSE IF the field in the record is statically known to be a
> multidimensional array:
>    THEN java.util.Arrays.deepEquals(array1, array2) is used to compare the
> component of two record instances     (applies to
> <recordclass>#equals(Object))
>    AND java.util.Arrays.deepHashCode(array) is used to compute the hashcode
> of the component                     (applies to  <recordclass>#hashCode())
>    AND java.util.Arrays.deepToString(array) is used to compute the string
> representation of the component        (applies to
> <recordclass>#toString())
> ELSE
>    just use the old code generation algorithm from Java 14 that uses
> java.lang.Object#equals

I think you've just illustrated how complex the alternative is -- and it 
is still not "right" 100% of the time.  So more complexity, just to be 
"wrong" less often, is not necessarily a good trade.

The basic problem is that sometimes people think that the by-contents 
behavior is natural, and sometimes they think that the by-identity 
behavior is natural, but we can't have both defaults.  A complex 
algorithm like this is trying to read the developer's mind. The current 
choice -- treat arrays as any other object -- is sometimes not what 
developers want, but it is far easier for developers to reason about 
what it means.  "Call the equals() method" is something every developer 
can understand.  They may grumble about "but the equals method for 
arrays is wrong", but at least they can understand why they are getting 
the result they are.

In the end, we settled on the simpler alternative, which is to uniformly 
appeal to Object::equals, and advise users to take care when using 
arrays and records, and consider using List instead if content equality 
is what is desired.

> I am not familiar with the compiler implementation details, but it looks
> like a trivial change to me.

The complexity of the compiler implementation is generally a 
non-consideration in the design process.