Value type hash code
Remi Forax
forax at univ-mlv.fr
Tue Apr 10 15:05:05 UTC 2018
In my opinion,
the VM should reject the value type that doesn't defines equals, hashCode and toString because a value type doesn't inherits from Object, so there is no default implementations provided.
Rémi
----- Mail original -----
> De: "David Simms" <david.simms at oracle.com>
> À: "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Mardi 10 Avril 2018 16:49:06
> Objet: Value type hash code
> After prototyping "hashCode()" for value types, here's a few
> observations and thoughts...
>
> * "The general contract of hashCode" [1] is unchanged.
> * The default implementation, if no user implementation is provided,
> is assumed to be completely based upon the entire contents of the
> value (there's nothing else to go on).
> o The current "Object" implementation, both its generation and
> object header hash caching are completely inappropriate for
> value types.
> o The VM cannot pretend to know one field is more significant than
> another.
> o Large values will benefit from user implementation to provide
> efficiency.
> o Whilst the VM may provide a default implementation for safety, a
> "javac" generated method would be optimal (can be optimized by
> the JIT, includes inlining).
> * Values containing references whose contents are mutable pose a
> problem, their hash code is only as stable as the contents of the
> whole object graph.
> o Objects may suffer similar problems, difficult to say this is
> any more important for values. Except to say values are supposed
> to be "immutable" but references may break this quality, perhaps
> "javac" could warn when value fields are mutable objects (not
> always possible, e.g. field reference to an interface).
>
> I assume a the default implementation should look something like this
> (only with concrete fields, not reflection):
>
> int hc = 0;
> for (Field field : val.getClass().getDeclaredFields()) {
> if (Modifier.isStatic(field.getModifiers())) continue;
>
> // Using the generic JDK hash for all types
> hc = (31 * hc) + Objects.hashCode(field.get(val));
> }
> return hc;
>
> This algorithm assumes the VM implements calls to reference field's
> hashCode(), and encodes primitives the same as their boxed JDK
> counter-parts (e.g. "Long.hashCode(long l)" does not generically hash
> two int size chunks, rather it xors hi and lo, Boolean is another
> interesting example 1231 || 1237). Unclear if this is actually
> important...however, this example:
>
> final __ByValue class MyInt implements Comparable<MyInt> {
> final int value;
> //....
> }
>
> The user is free to explicitly delegate to "Integer.hashCode(int val)",
> but is it just more natural that the default works this way ?
> Alternative VM implementations might hash over value data payload
> including field padding. With h/w support (suitable checksum
> instruction) there might be some performance gain for large values, but
> then if you introduce object references, said h/w support would get
> broken. Said implementation would be dependent on field layout, and not
> give the same result on different platforms. Whilst the Javadoc states
> hashing "need not remain consistent from one execution of an application
> to another execution of the same application." [1], I'm wondering how
> many folks rely on consistent hashing, more than nobody I would fear.
> Lastly hashing large amounts of data per value seems an unlikely general
> use-case.
>
>
> Cheers
> /David Simms
>
> [1]
> https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()
More information about the valhalla-dev
mailing list