Value type hash code

Wed Apr 11 14:20:07 UTC 2018

Sent from my iPad

On Apr 10, 2018, at 11:19 PM, Jonas Konrad <me at yawk.at<mailto:me at yawk.at>> wrote:

But this is already a problem in other contexts, such as lomboks @EqualsAndHashCode, auto-value, or IDE-generated eq/hc.

Those are user-code things, that need to take responsibility for their actions. It is quite different if the spec’ed behavior of the JDK makes recursive loops happen without user code (which both Lombok and IDEs are) overriding all hashCode() implementations in the recursion loop.

Arrays.hashcode() (e.g. https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(java.lang.Object%5B%5D) ) had to deal with the same issue, and similarly chose to use the identity hashcode of any contained arrays rather than their contents. I think that is the right, robust choice fir a default behavior.

I think defaulting to identity hash for references would make default hc/eq basically useless in cases where you have references in the value and would be unexpected to a user.

It is less unexpected than the hashcode of an immutable value being mutable by default (if it contains references to objects). And less unexpected than being able to create a recursion loop with a single override in a chain of defaults.

If you have circular object graphs you need to worry about this anyway (also for toString for example), and that can be done best inside the mutable object.

- Jonas

On 2018-04-10 18:14, Gil Tene wrote:
Need to watch out for recursion:
A value type instance x may include a reference to a mutable object o that may contain a value type field o.v, and may have provided a hashCode() implementation that depends on the value of o.v.hasCode(). If o.v = x, both x.hashCode() and o.hashCode() will infinitely recurse.
One may argue that a hashCode() override can cause this sort of recursion in other ways, e.g. a class C that overrides hashCode() and contains a reference r of type C can similarly infinitely recurse if o.r = o;  However, such infinite recursion can (currently) occur only if all hashCode() methods in the recursion chain have been overridden (which lets us claim user error).
The problem with the default implementation suggested below is that infinite recursion can now occur with an override in one place in the chain…
A way around this, which will also provide the benefit of making the default hashcode of value types stable in all cases, would be for the default value type hashCode() implementation to use the identity hashcode of each object referred from a reference field, rather than the hasCode() implementation of the referred object.
So I'd propose a change to:
        int hc = 0;
        for (Field field : val.getClass().getDeclaredFields()) {
            if (Modifier.isStatic(field.getModifiers())) continue;
       // Use Objects.hasCode for primitives and value classes; use identify hashcode for objects:
            int fieldHashCode = (field.getType().isPrimitive() || field.getType().isValue()) ?
                Objects.hashCode(field.get(val)) : System.identityHashCode(field.get(val));
            // Using the generic JDK hash for all types
            hc = (31 * hc) + fieldHashCode;
        }
        return hc;
— Gil.
On Apr 10, 2018, at 7:49 AM, David Simms <david.simms at oracle.com<mailto:david.simms at oracle.com>> wrote:

After prototyping "hashCode()" for value types, here's a few observations and thoughts...

* "The general contract of hashCode" [1] is unchanged.
* The default implementation, if no user implementation is provided,
  is assumed to be completely based upon the entire contents of the
  value (there's nothing else to go on).
    o The current "Object" implementation, both its generation and
      object header hash caching are completely inappropriate for
      value types.
    o The VM cannot pretend to know one field is more significant than
      another.
    o Large values will benefit from user implementation to provide
      efficiency.
    o Whilst the VM may provide a default implementation for safety, a
      "javac" generated method would be optimal (can be optimized by
      the JIT, includes inlining).
* Values containing references whose contents are mutable pose a
  problem, their hash code is only as stable as the contents of the
  whole object graph.
    o Objects may suffer similar problems, difficult to say this is
      any more important for values. Except to say values are supposed
      to be "immutable" but references may break this quality, perhaps
      "javac" could warn when value fields are mutable objects (not
      always possible, e.g. field reference to an interface).

I assume a the default implementation should look something like this (only with concrete fields, not reflection):

        int hc = 0;
        for (Field field : val.getClass().getDeclaredFields()) {
            if (Modifier.isStatic(field.getModifiers())) continue;

            // Using the generic JDK hash for all types
            hc = (31 * hc) + Objects.hashCode(field.get(val));
        }
        return hc;

This algorithm assumes the VM implements calls to reference field's hashCode(), and encodes primitives the same as their boxed JDK counter-parts (e.g. "Long.hashCode(long l)" does not generically hash two int size chunks, rather it xors hi and lo, Boolean is another interesting example 1231 || 1237). Unclear if this is actually important...however, this example:

    final __ByValue class MyInt implements Comparable<MyInt> {
        final int value;
        //....
    }

The user is free to explicitly delegate to "Integer.hashCode(int val)", but is it just more natural that the default works this way ? Alternative VM implementations might hash over value data payload including field padding. With h/w support (suitable checksum instruction) there might be some performance gain for large values, but then if you introduce object references, said h/w support would get broken. Said implementation would be dependent on field layout, and not give the same result on different platforms. Whilst the Javadoc states hashing "need not remain consistent from one execution of an application to another execution of the same application." [1], I'm wondering how many folks rely on consistent hashing, more than nobody I would fear. Lastly hashing large amounts of data per value seems an unlikely general use-case.

Cheers
/David Simms

[1] https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()