Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Ulf Zibis Ulf.Zibis at gmx.de
Sat May 26 01:43:40 UTC 2012


Am 23.05.2012 07:16, schrieb Mike Duigou:
> Webrevs for the Java 7u6 and 8 changes are available for download at [2] and [3] for your review. There are some important differences between the Java 7 and 8 implementations of this enhancement. Most specifically in the Java 8 implementation alternative string hashing is always enabled--no threshold is used for enablement and alternative hashing cannot be disabled. (The Java 8 implementation completely ignores the jdk.map.althashing.threshold system property). The Java 8 implementation is also subject to additional refinement as Java 8 develops.

To me it seems, that computing the murmur hash is more expensive, especially on short strings, than 
the old hash algorithm.
So I think, a developer should have an influence on the hash algorithm to be used by a hash map, 
especially for strings, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862.

So I propose a public accessible field in String:
     public int customHash;

Then the HashMap should have an overrideable hash method:
     protected int hash(K key) {
         h = key.hashCode();
         h ^= (h >>> 20) ^ (h >>> 12);
         return h ^ (h >>> 7) ^ (h >>> 4);
     }

Then we should have a new class:
public class MurmurHashMap<V> extends HashMap<String,V> {
     protected final int hash(String key) {
         int h = key.customHash;
         if (h == 0) {
             // harmless data race on customHash here.
             h = Hashing.murmur3_32(HASHING_SEED, key);
             // ensure result is not zero to avoid recalculating
             h = (h != 0) ? h : 1;
             key.customHash = h;
         }
         return hashMask ^ h;
     }

In class Hashing we need:
     public static int murmur3_32(int seed, String key) {
         int h1 = seed;
         int count = key.length();

         // body
         for (int off = 0; count >= 2;) {
             int k1 = (key.charAt(off++) & 0xFFFF) | key.charAt(off++) << 16);
// alternative:
         for (int off = 0; count >= 4;) {
             int k1 = (key.charAt(off) & 0x0FF)
                     | (key.charAt(off + 1) & 0x0FF) << 8
                     | (key.charAt(off + 2) & 0x0FF) << 16
                     | key.charAt(off + 3) << 24;
             ...
         }
         ...
     }

For other use cases see:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862

-Ulf




More information about the core-libs-dev mailing list