[records] equals / hashCode (was: Records -- current status)

Fri Apr 13 17:17:10 UTC 2018

Along the lines of the previous mail, people have and will ask "why 
can't I redefine equals/hashCode".  And the answer has two layers:

  - The constraints on equals/hashCode are stronger for records, and 
users might inadvertently violate them.  (They can be specified in the 
overrides of equals/hashCode in AbstractRecord, so there at least can be 
a place where this specification lives, even if no one reads it.)
  - In conjunction with ancillary fields, the constraints are sure to be 
violated, whether inadvertently and deliberately.

Let's take a look at what sorts of modifications to equals/hashCode 
would be OK, should we decide to relax this restriction.  Equality 
should still derive from the record's state, but there might be 
acceptable variations.

Would it be OK to _widen_ the definition of equality, by ignoring a 
component of the record?

This is an example of what Gunnar asked for, which is to restrict 
equality to the primary key fields:

     record PersonEntity(int primaryKey, String name, int age) {
         // equality based only on primaryKey
     }

Is this OK?  Well, let's look at our model:
  - Does ctor(dtor(c)) == c?  Yes.
  - if S1==S2, does ctor(S1) == ctor(S2)?  Yes.
  - For equal instances, does mutating them in the same way yield equal 
instances?  Yes.
  - For equal instances, does calling the same method on both with the 
same parameters yield equivalent results?  No.

So, if p1 == p2, we cannot rely on p1.age() == p2.age(), so this fails 
the requirements of our pseudo-formal model.  (Assuming our model is the 
right one.)

So, how would we feel about that?  Two records that are equals() to each 
other, but not substitable?

A more subtle version of this would be to consider all components, but 
use a more inclusive notion of equality for that field, such as 
comparing array components by contents.

     record Numbers(int[] numbers) {
         // equality based on Arrays.equals()
     }

  - Does ctor(dtor(c)) == c?  Yes.
  - Do equal state vectors produce equal records?  Yes.
  - Do identical mutations on equal records produce equal records? Yes.
  - Does identical operations on equal records produce equal results?  
Almost...

The Almost qualification can be seen here:
     int[] a1;
     int[] a2 = copyOf(a1);
     Numbers r1 = new Numbers(a1), r2 = new Numbers(a2);
     boolean same = a1.numbers().equals(a2.numbers())

The accessor will yield up the array references, which will not be 
equals() to each other.  This is essentially the same problem as above.

You get a similar result if your record represents something like a 
rational number and you don't normalize to lowest terms in the 
constructor; then you can have q1 equal q2, but q1.numerator() != 
q1.numerator().

Are any of these variations compelling enough to suggest we've got the 
wrong model?

On 3/16/2018 2:55 PM, Brian Goetz wrote:
> There are a number of potentially open details on the design for 
> records.  My inclination is to start with the simplest thing that 
> preserves the flexibility and expectations we want, and consider 
> opening up later as necessary.
>
> One of the biggest issues, which Kevin raised as a must-address issue, 
> is having sufficient support for precondition validation. Without 
> foreclosing on the ability to do more later with declarative guards, I 
> think the recent construction proposal meets the requirement for 
> lightweight enforcement with minimal or no duplication.  I'm hopeful 
> that this bit is "there".
>
> Our goal all along has been to define records as being “just macros” 
> for a finer-grained set of features.  Some of these are motivated by 
> boilerplate; some are motivated by semantics (coupling semantics of 
> API elements to state.)  In general, records will get there first, and 
> then ordinary classes will get the more general feature, but the 
> default answer for "can you relax records, so I can use it in this 
> case that almost but doesn't quite fit" should be "no, but there will 
> probably be a feature coming that makes that class simpler, wait for 
> that."
>
>
> Some other open issues (please see my writeup at 
> http://cr.openjdk.java.net/~briangoetz/amber/datum.html for 
> reference), and my current thoughts on these, are outlined below. 
> Comments welcome!
>
>  - Extension.  The proposal outlines a notion of abstract record, 
> which provides a "width subtyped" hierarchy.  Some have questioned 
> whether this carries its weight, especially given how Scala doesn't 
> support case-to-case extension (some see this as a bug, others as an 
> existence proof.)  Records can implement interfaces.
>
>  - Concrete records are final.  Relaxing this adds complexity to the 
> equality story; I'm not seeing good reasons to do so.
>
>  - Additional constructors.  I don't see any reason why additional 
> constructors are problematic, especially if they are constrained to 
> delegate to the default constructor (which in turn is made far simpler 
> if there can be statements ahead of the this() call.) Users may find 
> the lack of additional constructors to be an arbitrary limitation (and 
> they'd probably be right.)
>
>  - Static fields.  Static fields seem harmless.
>
>  - Additional instance fields.  These are a much bigger concern. While 
> the primary arguments against them are of the "slippery slope" 
> variety, I still have deep misgivings about supporting unrestricted 
> non-principal instance fields, and I also haven't found a reasonable 
> set of restrictions that makes this less risky.  I'd like to keep 
> looking for a better story here, before just caving on this, as I 
> worry doing so will end up biting us in the back.
>
>  - Mutability and accessibility.  I'd like to propose an odd choice 
> here, which is: fields are final and package (protected for abstract 
> records) by default, but finality can be explicitly opted out of 
> (non-final) and accessibility can be explicitly widened (public).
>
>  - Accessors.  Perhaps the most controversial aspect is that records 
> are inherently transparent to read; if something wants to truly 
> encapsulate state, it's not a record.  Records will eventually have 
> pattern deconstructors, which will expose their state, so we should go 
> out of the gate with the equivalent.  The obvious choice is to expose 
> read accessors automatically.  (These will not be named getXxx; we are 
> not burning the ill-advised Javabean naming conventions into the 
> language, no matter how much people think it already is.)  The obvious 
> naming choice for these accessors is fieldName().  No provision for 
> write accessors; that's bring-your-own.
>
>  - Core methods.  Records will get equals, hashCode, and toString.  
> There's a good argument for making equals/hashCode final (so they 
> can't be explicitly redeclared); this gives us stronger preservation 
> of the data invariants that allow us to safely and mechanically 
> snapshot / serialize / marshal (we'd definitely want this if we ever 
> allowed additional instance fields.)  No reason to suppress override 
> of toString, though. Records could be safely made cloneable() with 
> automatic support too (like arrays), but not clear if this is worth it 
> (its darn useful for arrays, though.)  I think the auto-generated 
> getters should be final too; this leaves arrays as second-class 
> components, but I am not sure that bothers me.
>
>
>
>
>