hg: jdk8/tl/jdk: 7126277: Alternative String hashing implementation
Changeset: 43bd5ee0205e Author: mduigou Date: 2012-05-30 22:18 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/43bd5ee0205e 7126277: Alternative String hashing implementation Summary: All of the hashing based Map implementations: HashMap, Hashtable, LinkedHashMap, WeakHashMap and ConcurrentHashMap are modified to use an enhanced hashing algorithm for string keys when the capacity of the hash table has ever grown beyond 512 entries. The enhanced hashing implementation uses the murmur3 hashing algorithm along with random hash seeds and index masks. These enhancements mitigate cases where colliding String hash values could result in a performance bottleneck. Reviewed-by: alanb, forax, dl ! make/java/java/FILES_java.gmk ! src/share/classes/java/lang/String.java ! src/share/classes/java/util/HashMap.java ! src/share/classes/java/util/Hashtable.java ! src/share/classes/java/util/LinkedHashMap.java ! src/share/classes/java/util/WeakHashMap.java ! src/share/classes/java/util/concurrent/ConcurrentHashMap.java + src/share/classes/sun/misc/Hashing.java ! test/java/util/Collection/BiggernYours.java ! test/java/util/Hashtable/HashCode.java ! test/java/util/Hashtable/SimpleSerialization.java + test/java/util/Map/Collisions.java ! test/java/util/Map/Get.java + test/sun/misc/Hashing.java
Hi Mike, some more questions: public class Hashmap<K,V> { + int hash(Object k) { + int h = hashSeed; + if (k instanceof String) { + return ((String) k).hash32(); + } else { + h ^= k.hashCode(); + } + + // This function ensures that hashCodes that differ only by + // constant multiples at each bit position have a bounded + // number of collisions (approximately 8 at default load factor). + h ^= (h >>> 20) ^ (h >>> 12); + return h ^ (h >>> 7) ^ (h >>> 4); } Why you declare and assign variable h before the if statement? And why you set h ^= k.hashCode(); in the else branch, but not the remaining instructions to complete the calculation? Similar (but why little different?) for: Hashtable WeakHashMap ConcurrentHashMap So couldn't method hash(Object) be moved to AbstractMap? Why in WeakHashMap method hash(Object) is not final? (In this case I could exceptionally override it for a custom implementation as former desired for all hash maps.) -Ulf Am 31.05.2012 07:19, schrieb mike.duigou@oracle.com:
Changeset: 43bd5ee0205e Author: mduigou Date: 2012-05-30 22:18 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/43bd5ee0205e
7126277: Alternative String hashing implementation Summary: All of the hashing based Map implementations: HashMap, Hashtable, LinkedHashMap, WeakHashMap and ConcurrentHashMap are modified to use an enhanced hashing algorithm for string keys when the capacity of the hash table has ever grown beyond 512 entries. The enhanced hashing implementation uses the murmur3 hashing algorithm along with random hash seeds and index masks. These enhancements mitigate cases where colliding String hash values could result in a performance bottleneck. Reviewed-by: alanb, forax, dl
! make/java/java/FILES_java.gmk ! src/share/classes/java/lang/String.java ! src/share/classes/java/util/HashMap.java ! src/share/classes/java/util/Hashtable.java ! src/share/classes/java/util/LinkedHashMap.java ! src/share/classes/java/util/WeakHashMap.java ! src/share/classes/java/util/concurrent/ConcurrentHashMap.java + src/share/classes/sun/misc/Hashing.java ! test/java/util/Collection/BiggernYours.java ! test/java/util/Hashtable/HashCode.java ! test/java/util/Hashtable/SimpleSerialization.java + test/java/util/Map/Collisions.java ! test/java/util/Map/Get.java + test/sun/misc/Hashing.java
On May 31 2012, at 01:40 , Ulf Zibis wrote:
Hi Mike,
some more questions:
public class Hashmap<K,V> { + int hash(Object k) { + int h = hashSeed; + if (k instanceof String) { + return ((String) k).hash32(); + } else { + h ^= k.hashCode(); + } + + // This function ensures that hashCodes that differ only by + // constant multiples at each bit position have a bounded + // number of collisions (approximately 8 at default load factor). + h ^= (h >>> 20) ^ (h >>> 12); + return h ^ (h >>> 7) ^ (h >>> 4); }
Why you declare and assign variable h before the if statement?
It used to be referenced in the String case. It could be moved. I will move it in the followon patch.
And why you set h ^= k.hashCode(); in the else branch, but not the remaining instructions to complete the calculation? Similar (but why little different?) for: Hashtable
I will move the read of hashSeed as per HashMap in the followon patch. Compatibility with legacy behaviour.
WeakHashMap
Will be the same as HashMap.
ConcurrentHashMap
Improved behaviour.
So couldn't method hash(Object) be moved to AbstractMap?
The differences in the avalanche (XOR scrambling) preclude this. It could be decided for Java 8 to use a consistent scrambling implementation. I would want to hear from Doug Lea whether he thinks this is reasonable/worthwhile. Presumably there is a reason that Hashtable, HashMap and ConcurrentHashMap use different scrambling solutions.
Why in WeakHashMap method hash(Object) is not final?
Oversight. I will fix this in the followon patch. Thank you for the observant feedback! Mike
(In this case I could exceptionally override it for a custom implementation as former desired for all hash maps.)
-Ulf
Am 31.05.2012 07:19, schrieb mike.duigou@oracle.com:
Changeset: 43bd5ee0205e Author: mduigou Date: 2012-05-30 22:18 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/43bd5ee0205e
7126277: Alternative String hashing implementation Summary: All of the hashing based Map implementations: HashMap, Hashtable, LinkedHashMap, WeakHashMap and ConcurrentHashMap are modified to use an enhanced hashing algorithm for string keys when the capacity of the hash table has ever grown beyond 512 entries. The enhanced hashing implementation uses the murmur3 hashing algorithm along with random hash seeds and index masks. These enhancements mitigate cases where colliding String hash values could result in a performance bottleneck. Reviewed-by: alanb, forax, dl
! make/java/java/FILES_java.gmk ! src/share/classes/java/lang/String.java ! src/share/classes/java/util/HashMap.java ! src/share/classes/java/util/Hashtable.java ! src/share/classes/java/util/LinkedHashMap.java ! src/share/classes/java/util/WeakHashMap.java ! src/share/classes/java/util/concurrent/ConcurrentHashMap.java + src/share/classes/sun/misc/Hashing.java ! test/java/util/Collection/BiggernYours.java ! test/java/util/Hashtable/HashCode.java ! test/java/util/Hashtable/SimpleSerialization.java + test/java/util/Map/Collisions.java ! test/java/util/Map/Get.java + test/sun/misc/Hashing.java
On 05/31/12 12:58, Mike Duigou wrote:
So couldn't method hash(Object) be moved to AbstractMap?
The differences in the avalanche (XOR scrambling) preclude this. It could be decided for Java 8 to use a consistent scrambling implementation. I would want to hear from Doug Lea whether he thinks this is reasonable/worthwhile.
I think a lot more (mainly empirical) performance analysis is needed to make this call. But I also think that the forms of JDK8 versions (and probably backports) are likely to change a lot anyway (ConcurrentHashMap for sure). So doing this anytime soon is likely a bad idea. -Doug
Hi Mike, Am 31.05.2012 07:19, schrieb mike.duigou@oracle.com:
Changeset: 43bd5ee0205e Author: mduigou Date: 2012-05-30 22:18 -0700 URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/43bd5ee0205e
7126277: Alternative String hashing implementation Summary: ... when the capacity of the hash table has ever grown beyond 512 entries.
I can't see this in the JDK8 code. So at least the changeset comment seems wrong. -Ulf
participants (4)
-
Doug Lea
-
Mike Duigou
-
mike.duigou@oracle.com
-
Ulf Zibis