Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Wed May 23 23:58:07 UTC 2012

Hi,

What about making this approach a little bit more general?
See: Bug <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862>6812862 
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862> - provide customizable hash() algorithm 
in HashMap for speed tuning
      + all later comments.
Then you additionally could save:
     if ((0 != h) && (k instanceof String))

Looking at the codes of many charsets, the main variance seems to be in the lower 8 bits of a 
character, especially if the strings belong to the same language. So if we would compose the initial 
32-bit values from 4 chars then the murmur3 algorithm could perform almost twice faster.

If you alter all hash maps in JDK to use a new hash value, which noteworthy use cases remain to use 
the legacy hashcode()? Do we really need 2 hash fields in String?

In project coin, we have set in stone to use compile time hashes for Strings_in_switch extension. So 
it never can't profit from the murmur3 optimization. IMO: what a pity!
(Prominent people have said, it will never make sense to change the String's hash algorithm.)
See: http://markmail.org/message/ig3nzmfinfuvgbwz
      http://markmail.org/message/h3nlhhae5qlmf37a

Am 23.05.2012 21:03, schrieb Mike Duigou:
>> Also, this change
>>
>> -        return h ^ (h>>>   7) ^ (h>>>   4);
>> +        h ^= (h>>>   7) ^ (h>>>   4);
>> +
>> +        return h;
>>
>> will make the compiler generates an additional iload/istore pair.
>> While the Jitted code will be the same, it may bother the inlining heuristic.
Wouldn' t
     return (h ^= (h>>>  7) ^ (h>>>  4));
have the same effect ?

Anyway, please add a comment for later readers.

-Ulf