Value type hash code

Tue Apr 10 15:05:05 UTC 2018

In my opinion,
the VM should reject the value type that doesn't defines equals, hashCode and toString because a value type doesn't inherits from Object, so there is no default implementations provided.

Rémi

----- Mail original -----
> De: "David Simms" <david.simms at oracle.com>
> À: "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Mardi 10 Avril 2018 16:49:06
> Objet: Value type hash code

> After prototyping "hashCode()" for value types, here's a few
> observations and thoughts...
> 
>  * "The general contract of hashCode" [1] is unchanged.
>  * The default implementation, if no user implementation is provided,
>    is assumed to be completely based upon the entire contents of the
>    value (there's nothing else to go on).
>      o The current "Object" implementation, both its generation and
>        object header hash caching are completely inappropriate for
>        value types.
>      o The VM cannot pretend to know one field is more significant than
>        another.
>      o Large values will benefit from user implementation to provide
>        efficiency.
>      o Whilst the VM may provide a default implementation for safety, a
>        "javac" generated method would be optimal (can be optimized by
>        the JIT, includes inlining).
>  * Values containing references whose contents are mutable pose a
>    problem, their hash code is only as stable as the contents of the
>    whole object graph.
>      o Objects may suffer similar problems, difficult to say this is
>        any more important for values. Except to say values are supposed
>        to be "immutable" but references may break this quality, perhaps
>        "javac" could warn when value fields are mutable objects (not
>        always possible, e.g. field reference to an interface).
> 
> I assume a the default implementation should look something like this
> (only with concrete fields, not reflection):
> 
>         int hc = 0;
>         for (Field field : val.getClass().getDeclaredFields()) {
>             if (Modifier.isStatic(field.getModifiers())) continue;
> 
>             // Using the generic JDK hash for all types
>             hc = (31 * hc) + Objects.hashCode(field.get(val));
>         }
>         return hc;
> 
> This algorithm assumes the VM implements calls to reference field's
> hashCode(), and encodes primitives the same as their boxed JDK
> counter-parts (e.g. "Long.hashCode(long l)" does not generically hash
> two int size chunks, rather it xors hi and lo, Boolean is another
> interesting example 1231 || 1237). Unclear if this is actually
> important...however, this example:
> 
>     final __ByValue class MyInt implements Comparable<MyInt> {
>         final int value;
>         //....
>     }
> 
> The user is free to explicitly delegate to "Integer.hashCode(int val)",
> but is it just more natural that the default works this way ?
> Alternative VM implementations might hash over value data payload
> including field padding. With h/w support (suitable checksum
> instruction) there might be some performance gain for large values, but
> then if you introduce object references, said h/w support would get
> broken. Said implementation would be dependent on field layout, and not
> give the same result on different platforms. Whilst the Javadoc states
> hashing "need not remain consistent from one execution of an application
> to another execution of the same application." [1], I'm wondering how
> many folks rely on consistent hashing, more than nobody I would fear.
> Lastly hashing large amounts of data per value seems an unlikely general
> use-case.
> 
> 
> Cheers
> /David Simms
> 
> [1]
> https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()