Value type hash code
John Rose
john.r.rose at oracle.com
Wed Apr 11 21:10:21 UTC 2018
On Apr 11, 2018, at 11:42 AM, Stuart Marks <stuart.marks at oracle.com> wrote:
>
> Interesting stuff. Several observations:
>
> * Clearly there ought to be a default implementation. One question I have is whether the default implementation ought to be *specified*. If it's not specified -- really, it's specified only to conform to the general contract, but the details aren't specified -- then we'll have more freedom to change it in the future. Indeed, it'd be nice if it changed from time to time, or even frequently; it would help avoid code making inadvertent dependencies on the hash values.
Maybe (but probably not) that's tenable at the language level. We JVM folks
need certainty too, for naked classes as well as fully clothed ones.
(We could have the JVM outlaw certain ill-clothed ones, of course,
but that move doesn't seem to buy increased clarity at a the JVM level.
The JVM still needs to define at least an analog of Object.equals
for values, I suspect.)
> * I think a primary use case for value types will be as HashMap keys. If so, I don't see any reasonable alternative than to delegate to the hashCode() of ordinary objects referenced from a value type, even if they're mutable. Lists of any implementation are defined to be equal and to have the same hashcode if their elements are equal. Two value types containing distinct List instances, containing equal elements should be equal and have equal hash codes.
Yep. Which is why we are wondering whether to break symmetry on
today's Object.equals and give it a more useful meaning for values
even though it is pretty useless for today's references.
> Yes, this implies that a value type's hashCode() and equals() can change if it contains a reference to a mutable object. Sure, someone (javac?) can try to warn about this, but then we get into the business of tagging some objects as immutable. I'm not sure I want to go there.
>
> To a certain extent, people are used to this; they use mutable objects as HashMap keys all the time. They just "know" that they mustn't mutate the object while it's in use. Occasionally, though, somebody will pop up in a huff complaining that HashMap is broken, and it's because they mutated a key... and then they'll complain that this isn't properly documented. Overall though, I don't think this is a serious problem.
(To me that's like the self-loop/SOE problem: Very possible in theory,
but our users somehow manage to avoid it.)
> * I'm sort-of ok with arrays using identity hash code, since that's what you get when you call hashCode() on them. The user can override this by calling Arrays.hashCode() or deepHashCode() if desired. But I can't help but think this is perpetuating a mistake….
I agree strongly with this feeling, and would like to find a clever escape
from the box we've put ourselves in.
> * I'm wondering if it makes sense to mix the field types into the hashCode, at least for primitives. The hashcodes for the integral primitives are all equal to the value, for small values. That is, the hashes of (byte)123, (short)123, (char)123, 123, and 123L are all 123. That bugs me. I'm not sure how much practical impact it would have, though.
I'd rather mix in the value class. That would handle all the field types
in one fell swoop.
> * Somebody (maybe me) should do some research into better hash algorithms.
You too! You noticed that broken toenail!
> The base-31 polynomial was probably quite reasonable 20 years ago, but better things might have been discovered since then. Intuitively, I'd say that the starting value should be something other than zero, preferably a large prime number. Since the hash code of 0 is 0, and the hash code of "" is 0, it's quite likely for a value containing zeros and empty strings to be zero! (Unless we mix in the field types, as above.)
It was a New Jersey move even 20 years ago. IIRC we tested it on
medium sized corpuses of URL strings and found it was "not too terrible".
Now it's just plain terrible, given the available alternatives, and the
continuing expansion of uses cases that Java serves.
— John
More information about the valhalla-dev
mailing list