Value-Based Classes - equals/hashCode
Brian Goetz
brian.goetz at oracle.com
Sat Feb 14 21:03:41 UTC 2015
> But yes, also, what's up with that? One interpretation is surely that
> the StringHolder's string (to stick with that example) can not be
> accessed as it is "any other object". On the other hand the string is
> surely part of the holder's state which is expressively allowed to be
> used for that computation. So what is it?
Which brings us back to the fundamentally murky part -- what comprises
the state of an object?
In languages like Lime (a Java-like language designed for GPU
computation), a value type must be "values all the way down". (Their
primitive data type is "bit"; everything else is defined as composites
of bit.) This is a beautiful and clean way to define things, but not
applicable to the practical problem of adding true values to a
mutation-happy language like Java; even simple types like String require
some mental gymnastics to see them as value-like (String stores
characters in an array, which is mutable, but just not mutated after
initialization, and even has a lazily-initialized, mutable field for the
hash code (if that's not bad enough, this field is updated using data
races!)) But not being able to have String as a component of a value
would be silly, and no one disputes that strings are "value-ish". But
the clean definitions don't apply.
We found ourselves with a related problem when trying to define the
constraints on behavioral parameters passed to stream methods (e.g., the
Predicate passed to Stream.filter). The "obvious" restrictions were
"side-effect free" or "pure function". But neither of these is the
right answer; both are unnecessarily restrictive. (Printing a debug
message or updating a history buffer are acceptable side effects;
looking up data in a mutable but not-mutated-during-the-query HashMap is
an acceptable compromise of purity.) We settled on a murky definition
of "non-interfering" and "stateless", knowing full well that our
definitions are built on sand, but tried to capture the true spirit of
the restriction, which is "don't depend on stuff that might change
during the calculation."
Said sand includes notions like "what is the state of an object".
Here's a stab at thinking about this, which I'm sure is wrong, but might
be a helpful start. Say a value-ish class C has fields c_1..c_n. Now
let's try and define S(C), the variables that comprise the "state" of C.
S(C) includes:
- \forall_i c_i, if c_i is a primitive
- \forall_i S(c_i), if c_i is a reference to a value-ish class
- \forall_i c_i, if c_i is a reference to a non-value-ish class
In other words, if a value contains other values, we recursively include
the dependent value in the state; if a value contains references to
non-values, we can only include the that reference (references are
values!), but not any state reachable through that reference. (We wave
our hands and magically define certain otherwise-problematic classes
(like String) to be value-ish.)
So, in the following class:
final class Urk {
private final MutableStuff mc;
private Urk(MutableStuff mc) { this.mc = mc; }
public Urk make(MutableStuff mc) { return new Urk(mc); }
public boolean equals(Object other) {
return (other instanceof Urk)
&& ((Urk) other).mc == mc;
}
// similar for hashCode
}
I would argue this is following the rules; Urk's state includes the
*reference to* mc, but not the *state of* mc. If Urk were to dive into
the state of mc, then it would have crossed the line.
Hope this helps.
More information about the valhalla-dev
mailing list