Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Thu May 24 00:22:53 UTC 2012

On May 23 2012, at 16:58 , Ulf Zibis wrote:

> Hi,
> 
> What about making this approach a little bit more general?
> See: Bug 6812862 - provide customizable hash() algorithm in HashMap for speed tuning

It is possible for JDK 8. One option is to provide the hash() method as a virtual extension method on Map allowing implementations to override it. 

>      + all later comments.
> Then you additionally could save:
>     if ((0 != h) && (k instanceof String))
> 
> Looking at the codes of many charsets, the main variance seems to be in the lower 8 bits of a character, especially if the strings belong to the same language. So if we would compose the initial 32-bit values from 4 chars then the murmur3 algorithm could perform almost twice faster.

Interesting idea. The description specifically avoids defining the behaviour so implementations can do this kind of optimization. For 7u6, unless there are technical problems, the proposed. implementation is sufficient. The important message though is that it can be changed later.

> 
> If you alter all hash maps in JDK to use a new hash value, which noteworthy use cases remain to use the legacy hashcode()?

Compatibility (which is of paramount importance). Part of the difficulty is that the String.hashCode() calculation method is part of the specification. Changing the specification is possible but probably not practical.

> Do we really need 2 hash fields in String?

Performance without caching the hash code result is unacceptable. (I've tried it).

> In project coin, we have set in stone to use compile time hashes for Strings_in_switch extension. So it never can't profit from the murmur3 optimization. IMO: what a pity!
> (Prominent people have said, it will never make sense to change the String's hash algorithm.)
> See: http://markmail.org/message/ig3nzmfinfuvgbwz
>      http://markmail.org/message/h3nlhhae5qlmf37a

Yes, there other other reasons as well.

Mike

> 
> 
> Am 23.05.2012 21:03, schrieb Mike Duigou:
>> 
>>> Also, this change
>>> 
>>> -        return h ^ (h>>>  7) ^ (h>>>  4);
>>> +        h ^= (h>>>  7) ^ (h>>>  4);
>>> +
>>> +        return h;
>>> 
>>> will make the compiler generates an additional iload/istore pair.
>>> While the Jitted code will be the same, it may bother the inlining heuristic.
> Wouldn' t
>     return (h ^= (h>>>  7) ^ (h>>>  4));
> have the same effect ?
> 
> Anyway, please add a comment for later readers.
> 
> -Ulf
>