Value-Based Classes - equals/hashCode

Brian Goetz brian.goetz at oracle.com
Sat Feb 14 21:03:41 UTC 2015


> But yes, also, what's up with that? One interpretation is surely that
> the StringHolder's string (to stick with that example) can not be
> accessed as it is "any other object". On the other hand the string is
> surely part of the holder's state which is expressively allowed to be
> used for that computation. So what is it?

Which brings us back to the fundamentally murky part -- what comprises 
the state of an object?

In languages like Lime (a Java-like language designed for GPU 
computation), a value type must be "values all the way down".  (Their 
primitive data type is "bit"; everything else is defined as composites 
of bit.)  This is a beautiful and clean way to define things, but not 
applicable to the practical problem of adding true values to a 
mutation-happy language like Java; even simple types like String require 
some mental gymnastics to see them as value-like (String stores 
characters in an array, which is mutable, but just not mutated after 
initialization, and even has a lazily-initialized, mutable field for the 
hash code (if that's not bad enough, this field is updated using data 
races!))  But not being able to have String as a component of a value 
would be silly, and no one disputes that strings are "value-ish".  But 
the clean definitions don't apply.

We found ourselves with a related problem when trying to define the 
constraints on behavioral parameters passed to stream methods (e.g., the 
Predicate passed to Stream.filter).  The "obvious" restrictions were 
"side-effect free" or "pure function".  But neither of these is the 
right answer; both are unnecessarily restrictive.  (Printing a debug 
message or updating a history buffer are acceptable side effects; 
looking up data in a mutable but not-mutated-during-the-query HashMap is 
an acceptable compromise of purity.)  We settled on a murky definition 
of "non-interfering" and "stateless", knowing full well that our 
definitions are built on sand, but tried to capture the true spirit of 
the restriction, which is "don't depend on stuff that might change 
during the calculation."

Said sand includes notions like "what is the state of an object".

Here's a stab at thinking about this, which I'm sure is wrong, but might 
be a helpful start.  Say a value-ish class C has fields c_1..c_n.  Now 
let's try and define S(C), the variables that comprise the "state" of C. 
  S(C) includes:
  - \forall_i c_i, if c_i is a primitive
  - \forall_i S(c_i), if c_i is a reference to a value-ish class
  - \forall_i c_i, if c_i is a reference to a non-value-ish class

In other words, if a value contains other values, we recursively include 
the dependent value in the state; if a value contains references to 
non-values, we can only include the that reference (references are 
values!), but not any state reachable through that reference.  (We wave 
our hands and magically define certain otherwise-problematic classes 
(like String) to be value-ish.)

So, in the following class:

final class Urk {
     private final MutableStuff mc;

     private Urk(MutableStuff mc) { this.mc = mc; }
     public Urk make(MutableStuff mc) { return new Urk(mc); }

     public boolean equals(Object other) {
         return (other instanceof Urk)
             && ((Urk) other).mc == mc;
     }

     // similar for hashCode
}

I would argue this is following the rules; Urk's state includes the 
*reference to* mc, but not the *state of* mc.  If Urk were to dive into 
the state of mc, then it would have crossed the line.

Hope this helps.




More information about the valhalla-dev mailing list