Value type hash code

Jonas Konrad me at yawk.at
Wed Apr 11 15:43:44 UTC 2018


 > Arrays.hashcode() (e.g.
 > 
https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(java.lang.Object%5B%5D) 
)
 > had to deal with the same issue, and similarly chose to use the identity
 > hashcode of any contained arrays rather than their contents. I think
 > that is the right, robust choice fir a default behavior.

I do not agree that that is the same. Yes, Arrays.equals and .hashCode 
are shallow, but that is just an artefact of array equality being 
defined as it is (as identity). I don't think it's anyone's goal to 
repeat the same mistakes as the array api and need Values.deepHashCode 
later :) Had arrays had a proper eq/hc instead of simple identity from 
the start, they would've been implemented with possible infinite 
recursion as well. Since basically nobody uses array eq/hc I don't think 
the current behaviour is very useful - we can't change it for arrays 
now, but we don't have to do the same for values.

The 'new' API which does this 'right' is List, and List uses member 
eq/hc. I don't think it really matters if I introduce a circular 
dependency through a value type and get infinite recursion or I 
introduce a circular dependency through a List and get infinite 
recursion. Other examples are 'value-based' classes in the JDK right 
now, such as Optional - in fact it has the same issues you are talking 
about - but it does not use identity either.

The reason I bring up lombok/auto-value/IDEs is because this is how it 
is implemented most commonly (by far) in the real world. If you want the 
default eq/hc to be actually useful to end users, that's how to do it.

This also adds another unintuitive difference between value and 
non-value types (unless you override eq/hc yourself) which seems pretty 
arbitrary to me.

- Jonas

On 04/11/2018 04:20 PM, Gil Tene wrote:
> 
> 
> Sent from my iPad
> 
> On Apr 10, 2018, at 11:19 PM, Jonas Konrad <me at yawk.at 
> <mailto:me at yawk.at>> wrote:
> 
>> But this is already a problem in other contexts, such as lomboks 
>> @EqualsAndHashCode, auto-value, or IDE-generated eq/hc.
> 
> Those are user-code things, that need to take responsibility for their 
> actions. It is quite different if the spec’ed behavior of the JDK makes 
> recursive loops happen without user code (which both Lombok and IDEs 
> are) overriding all hashCode() implementations in the recursion loop.
> 
> Arrays.hashcode() (e.g. 
> https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(java.lang.Object%5B%5D) ) 
> had to deal with the same issue, and similarly chose to use the identity 
> hashcode of any contained arrays rather than their contents. I think 
> that is the right, robust choice fir a default behavior.
> 
>>
>> I think defaulting to identity hash for references would make default 
>> hc/eq basically useless in cases where you have references in the 
>> value and would be unexpected to a user.
> 
> It is less unexpected than the hashcode of an immutable value being 
> mutable by default (if it contains references to objects). And less 
> unexpected than being able to create a recursion loop with a single 
> override in a chain of defaults.
> 
>> If you have circular object graphs you need to worry about this anyway 
>> (also for toString for example), and that can be done best inside the 
>> mutable object.
>>
>> - Jonas
>>
>> On 2018-04-10 18:14, Gil Tene wrote:
>>> Need to watch out for recursion:
>>> A value type instance x may include a reference to a mutable object o 
>>> that may contain a value type field o.v, and may have provided a 
>>> hashCode() implementation that depends on the value of o.v.hasCode(). 
>>> If o.v = x, both x.hashCode() and o.hashCode() will infinitely recurse.
>>> One may argue that a hashCode() override can cause this sort of 
>>> recursion in other ways, e.g. a class C that overrides hashCode() and 
>>> contains a reference r of type C can similarly infinitely recurse if 
>>> o.r = o;  However, such infinite recursion can (currently) occur only 
>>> if all hashCode() methods in the recursion chain have been overridden 
>>> (which lets us claim user error).
>>> The problem with the default implementation suggested below is that 
>>> infinite recursion can now occur with an override in one place in the 
>>> chain…
>>> A way around this, which will also provide the benefit of making the 
>>> default hashcode of value types stable in all cases, would be for the 
>>> default value type hashCode() implementation to use the identity 
>>> hashcode of each object referred from a reference field, rather than 
>>> the hasCode() implementation of the referred object.
>>> So I'd propose a change to:
>>>         int hc = 0;
>>>         for (Field field : val.getClass().getDeclaredFields()) {
>>>             if (Modifier.isStatic(field.getModifiers())) continue;
>>>        // Use Objects.hasCode for primitives and value classes; use 
>>> identify hashcode for objects:
>>>             int fieldHashCode = (field.getType().isPrimitive() || 
>>> field.getType().isValue()) ?
>>>                 Objects.hashCode(field.get(val)) : 
>>> System.identityHashCode(field.get(val));
>>>             // Using the generic JDK hash for all types
>>>             hc = (31 * hc) + fieldHashCode;
>>>         }
>>>         return hc;
>>> — Gil.
>>>> On Apr 10, 2018, at 7:49 AM, David Simms <david.simms at oracle.com 
>>>> <mailto:david.simms at oracle.com>> wrote:
>>>>
>>>>
>>>> After prototyping "hashCode()" for value types, here's a few 
>>>> observations and thoughts...
>>>>
>>>> * "The general contract of hashCode" [1] is unchanged.
>>>> * The default implementation, if no user implementation is provided,
>>>>   is assumed to be completely based upon the entire contents of the
>>>>   value (there's nothing else to go on).
>>>>     o The current "Object" implementation, both its generation and
>>>>       object header hash caching are completely inappropriate for
>>>>       value types.
>>>>     o The VM cannot pretend to know one field is more significant than
>>>>       another.
>>>>     o Large values will benefit from user implementation to provide
>>>>       efficiency.
>>>>     o Whilst the VM may provide a default implementation for safety, a
>>>>       "javac" generated method would be optimal (can be optimized by
>>>>       the JIT, includes inlining).
>>>> * Values containing references whose contents are mutable pose a
>>>>   problem, their hash code is only as stable as the contents of the
>>>>   whole object graph.
>>>>     o Objects may suffer similar problems, difficult to say this is
>>>>       any more important for values. Except to say values are supposed
>>>>       to be "immutable" but references may break this quality, perhaps
>>>>       "javac" could warn when value fields are mutable objects (not
>>>>       always possible, e.g. field reference to an interface).
>>>>
>>>> I assume a the default implementation should look something like 
>>>> this (only with concrete fields, not reflection):
>>>>
>>>>         int hc = 0;
>>>>         for (Field field : val.getClass().getDeclaredFields()) {
>>>>             if (Modifier.isStatic(field.getModifiers())) continue;
>>>>
>>>>             // Using the generic JDK hash for all types
>>>>             hc = (31 * hc) + Objects.hashCode(field.get(val));
>>>>         }
>>>>         return hc;
>>>>
>>>> This algorithm assumes the VM implements calls to reference field's 
>>>> hashCode(), and encodes primitives the same as their boxed JDK 
>>>> counter-parts (e.g. "Long.hashCode(long l)" does not generically 
>>>> hash two int size chunks, rather it xors hi and lo, Boolean is 
>>>> another interesting example 1231 || 1237). Unclear if this is 
>>>> actually important...however, this example:
>>>>
>>>>     final __ByValue class MyInt implements Comparable<MyInt> {
>>>>         final int value;
>>>>         //....
>>>>     }
>>>>
>>>> The user is free to explicitly delegate to "Integer.hashCode(int 
>>>> val)", but is it just more natural that the default works this way ? 
>>>> Alternative VM implementations might hash over value data payload 
>>>> including field padding. With h/w support (suitable checksum 
>>>> instruction) there might be some performance gain for large values, 
>>>> but then if you introduce object references, said h/w support would 
>>>> get broken. Said implementation would be dependent on field layout, 
>>>> and not give the same result on different platforms. Whilst the 
>>>> Javadoc states hashing "need not remain consistent from one 
>>>> execution of an application to another execution of the same 
>>>> application." [1], I'm wondering how many folks rely on consistent 
>>>> hashing, more than nobody I would fear. Lastly hashing large amounts 
>>>> of data per value seems an unlikely general use-case.
>>>>
>>>>
>>>> Cheers
>>>> /David Simms
>>>>
>>>> [1] 
>>>> https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()
>>>>



More information about the valhalla-dev mailing list