Value type hash code

Wed Apr 11 15:38:48 UTC 2018

On the topic of mechanics, there's some overlap between what we do for 
values and what we do for records (and of course their intersection, 
value records.)  We've got an indy bootstrap that you load up with a 
sequence of field-getter method handles, and it produces an MH that does 
short-circuit lazy map-reduce over the values for both equals and 
hashCode.  Eventually we'll make these codebases line up.

On 4/10/2018 10:49 AM, David Simms wrote:
>
> After prototyping "hashCode()" for value types, here's a few 
> observations and thoughts...
>
>  * "The general contract of hashCode" [1] is unchanged.
>  * The default implementation, if no user implementation is provided,
>    is assumed to be completely based upon the entire contents of the
>    value (there's nothing else to go on).
>      o The current "Object" implementation, both its generation and
>        object header hash caching are completely inappropriate for
>        value types.
>      o The VM cannot pretend to know one field is more significant than
>        another.
>      o Large values will benefit from user implementation to provide
>        efficiency.
>      o Whilst the VM may provide a default implementation for safety, a
>        "javac" generated method would be optimal (can be optimized by
>        the JIT, includes inlining).
>  * Values containing references whose contents are mutable pose a
>    problem, their hash code is only as stable as the contents of the
>    whole object graph.
>      o Objects may suffer similar problems, difficult to say this is
>        any more important for values. Except to say values are supposed
>        to be "immutable" but references may break this quality, perhaps
>        "javac" could warn when value fields are mutable objects (not
>        always possible, e.g. field reference to an interface).
>
> I assume a the default implementation should look something like this 
> (only with concrete fields, not reflection):
>
>         int hc = 0;
>         for (Field field : val.getClass().getDeclaredFields()) {
>             if (Modifier.isStatic(field.getModifiers())) continue;
>
>             // Using the generic JDK hash for all types
>             hc = (31 * hc) + Objects.hashCode(field.get(val));
>         }
>         return hc;
>
> This algorithm assumes the VM implements calls to reference field's 
> hashCode(), and encodes primitives the same as their boxed JDK 
> counter-parts (e.g. "Long.hashCode(long l)" does not generically hash 
> two int size chunks, rather it xors hi and lo, Boolean is another 
> interesting example 1231 || 1237). Unclear if this is actually 
> important...however, this example:
>
>     final __ByValue class MyInt implements Comparable<MyInt> {
>         final int value;
>         //....
>     }
>
> The user is free to explicitly delegate to "Integer.hashCode(int 
> val)", but is it just more natural that the default works this way ? 
> Alternative VM implementations might hash over value data payload 
> including field padding. With h/w support (suitable checksum 
> instruction) there might be some performance gain for large values, 
> but then if you introduce object references, said h/w support would 
> get broken. Said implementation would be dependent on field layout, 
> and not give the same result on different platforms. Whilst the 
> Javadoc states hashing "need not remain consistent from one execution 
> of an application to another execution of the same application." [1], 
> I'm wondering how many folks rely on consistent hashing, more than 
> nobody I would fear. Lastly hashing large amounts of data per value 
> seems an unlikely general use-case.
>
>
> Cheers
> /David Simms
>
> [1] 
> https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html#hashCode()
>
>