Value type hash code
Jonas Konrad
me at yawk.at
Wed Apr 11 15:43:44 UTC 2018
> Arrays.hashcode() (e.g.
>
https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(java.lang.Object%5B%5D)
)
> had to deal with the same issue, and similarly chose to use the identity
> hashcode of any contained arrays rather than their contents. I think
> that is the right, robust choice fir a default behavior.
I do not agree that that is the same. Yes, Arrays.equals and .hashCode
are shallow, but that is just an artefact of array equality being
defined as it is (as identity). I don't think it's anyone's goal to
repeat the same mistakes as the array api and need Values.deepHashCode
later :) Had arrays had a proper eq/hc instead of simple identity from
the start, they would've been implemented with possible infinite
recursion as well. Since basically nobody uses array eq/hc I don't think
the current behaviour is very useful - we can't change it for arrays
now, but we don't have to do the same for values.
The 'new' API which does this 'right' is List, and List uses member
eq/hc. I don't think it really matters if I introduce a circular
dependency through a value type and get infinite recursion or I
introduce a circular dependency through a List and get infinite
recursion. Other examples are 'value-based' classes in the JDK right
now, such as Optional - in fact it has the same issues you are talking
about - but it does not use identity either.
The reason I bring up lombok/auto-value/IDEs is because this is how it
is implemented most commonly (by far) in the real world. If you want the
default eq/hc to be actually useful to end users, that's how to do it.
This also adds another unintuitive difference between value and
non-value types (unless you override eq/hc yourself) which seems pretty
arbitrary to me.
- Jonas
On 04/11/2018 04:20 PM, Gil Tene wrote:
>
>
> Sent from my iPad
>
> On Apr 10, 2018, at 11:19 PM, Jonas Konrad <me at yawk.at
> <mailto:me at yawk.at>> wrote:
>
>> But this is already a problem in other contexts, such as lomboks
>> @EqualsAndHashCode, auto-value, or IDE-generated eq/hc.
>
> Those are user-code things, that need to take responsibility for their
> actions. It is quite different if the spec’ed behavior of the JDK makes
> recursive loops happen without user code (which both Lombok and IDEs
> are) overriding all hashCode() implementations in the recursion loop.
>
> Arrays.hashcode() (e.g.
> https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(java.lang.Object%5B%5D) )
> had to deal with the same issue, and similarly chose to use the identity
> hashcode of any contained arrays rather than their contents. I think
> that is the right, robust choice fir a default behavior.
>
>>
>> I think defaulting to identity hash for references would make default
>> hc/eq basically useless in cases where you have references in the
>> value and would be unexpected to a user.
>
> It is less unexpected than the hashcode of an immutable value being
> mutable by default (if it contains references to objects). And less
> unexpected than being able to create a recursion loop with a single
> override in a chain of defaults.
>
>> If you have circular object graphs you need to worry about this anyway
>> (also for toString for example), and that can be done best inside the
>> mutable object.
>>
>> - Jonas
>>
>> On 2018-04-10 18:14, Gil Tene wrote:
>>> Need to watch out for recursion:
>>> A value type instance x may include a reference to a mutable object o
>>> that may contain a value type field o.v, and may have provided a
>>> hashCode() implementation that depends on the value of o.v.hasCode().
>>> If o.v = x, both x.hashCode() and o.hashCode() will infinitely recurse.
>>> One may argue that a hashCode() override can cause this sort of
>>> recursion in other ways, e.g. a class C that overrides hashCode() and
>>> contains a reference r of type C can similarly infinitely recurse if
>>> o.r = o; However, such infinite recursion can (currently) occur only
>>> if all hashCode() methods in the recursion chain have been overridden
>>> (which lets us claim user error).
>>> The problem with the default implementation suggested below is that
>>> infinite recursion can now occur with an override in one place in the
>>> chain…
>>> A way around this, which will also provide the benefit of making the
>>> default hashcode of value types stable in all cases, would be for the
>>> default value type hashCode() implementation to use the identity
>>> hashcode of each object referred from a reference field, rather than
>>> the hasCode() implementation of the referred object.
>>> So I'd propose a change to:
>>> int hc = 0;
>>> for (Field field : val.getClass().getDeclaredFields()) {
>>> if (Modifier.isStatic(field.getModifiers())) continue;
>>> // Use Objects.hasCode for primitives and value classes; use
>>> identify hashcode for objects:
>>> int fieldHashCode = (field.getType().isPrimitive() ||
>>> field.getType().isValue()) ?
>>> Objects.hashCode(field.get(val)) :
>>> System.identityHashCode(field.get(val));
>>> // Using the generic JDK hash for all types
>>> hc = (31 * hc) + fieldHashCode;
>>> }
>>> return hc;
>>> — Gil.
>>>> On Apr 10, 2018, at 7:49 AM, David Simms <david.simms at oracle.com
>>>> <mailto:david.simms at oracle.com>> wrote:
>>>>
>>>>
>>>> After prototyping "hashCode()" for value types, here's a few
>>>> observations and thoughts...
>>>>
>>>> * "The general contract of hashCode" [1] is unchanged.
>>>> * The default implementation, if no user implementation is provided,
>>>> is assumed to be completely based upon the entire contents of the
>>>> value (there's nothing else to go on).
>>>> o The current "Object" implementation, both its generation and
>>>> object header hash caching are completely inappropriate for
>>>> value types.
>>>> o The VM cannot pretend to know one field is more significant than
>>>> another.
>>>> o Large values will benefit from user implementation to provide
>>>> efficiency.
>>>> o Whilst the VM may provide a default implementation for safety, a
>>>> "javac" generated method would be optimal (can be optimized by
>>>> the JIT, includes inlining).
>>>> * Values containing references whose contents are mutable pose a
>>>> problem, their hash code is only as stable as the contents of the
>>>> whole object graph.
>>>> o Objects may suffer similar problems, difficult to say this is
>>>> any more important for values. Except to say values are supposed
>>>> to be "immutable" but references may break this quality, perhaps
>>>> "javac" could warn when value fields are mutable objects (not
>>>> always possible, e.g. field reference to an interface).
>>>>
>>>> I assume a the default implementation should look something like
>>>> this (only with concrete fields, not reflection):
>>>>
>>>> int hc = 0;
>>>> for (Field field : val.getClass().getDeclaredFields()) {
>>>> if (Modifier.isStatic(field.getModifiers())) continue;
>>>>
>>>> // Using the generic JDK hash for all types
>>>> hc = (31 * hc) + Objects.hashCode(field.get(val));
>>>> }
>>>> return hc;
>>>>
>>>> This algorithm assumes the VM implements calls to reference field's
>>>> hashCode(), and encodes primitives the same as their boxed JDK
>>>> counter-parts (e.g. "Long.hashCode(long l)" does not generically
>>>> hash two int size chunks, rather it xors hi and lo, Boolean is
>>>> another interesting example 1231 || 1237). Unclear if this is
>>>> actually important...however, this example:
>>>>
>>>> final __ByValue class MyInt implements Comparable<MyInt> {
>>>> final int value;
>>>> //....
>>>> }
>>>>
>>>> The user is free to explicitly delegate to "Integer.hashCode(int
>>>> val)", but is it just more natural that the default works this way ?
>>>> Alternative VM implementations might hash over value data payload
>>>> including field padding. With h/w support (suitable checksum
>>>> instruction) there might be some performance gain for large values,
>>>> but then if you introduce object references, said h/w support would
>>>> get broken. Said implementation would be dependent on field layout,
>>>> and not give the same result on different platforms. Whilst the
>>>> Javadoc states hashing "need not remain consistent from one
>>>> execution of an application to another execution of the same
>>>> application." [1], I'm wondering how many folks rely on consistent
>>>> hashing, more than nobody I would fear. Lastly hashing large amounts
>>>> of data per value seems an unlikely general use-case.
>>>>
>>>>
>>>> Cheers
>>>> /David Simms
>>>>
>>>> [1]
>>>> https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()
>>>>
More information about the valhalla-dev
mailing list