Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Mike Duigou mike.duigou at oracle.com
Wed May 23 19:03:03 UTC 2012


Hi Remi;

As always, thank you for the valuable feedback.

On May 23 2012, at 03:42 , Rémi Forax wrote:

> Hi Mike,
> these comments are for jdk8, I have no time currently to take a look to jdk7 counterpart.
> 
> First, why do you change make/java/java/Makefile ?

It was temporary and should no longer be needed. There was skew between my patch and the java.utils warnings cleanup for a while. I will make sure that this change can be removed. I won't commit without making sure that the Makefile change is not required.

> 
> In java.lang.String, when you compute the HASHING_SEED in the static block,
> you should ensure that it's never zero in the static block instead of doing it
> in hash32().

These are different seeds. The zero value is a signal that alternative hashing is disabled.

> In HashMap, the class Holder should not declare the static final fields 'private'
> because the compiler will generate an accessor in that case,

I will fix this.

> again musn.misc.Hashing.makeHashMask should never return 0 to avoid
> the check in hashObject().
> 
> I suppose that the cast to j.l.String in hashObject() should be a cast to Hashable32
> otherwise I don't see the purpose to introduce such interface
> (note that WeakHashMap used Hashable32 correctly).

I responded to this issue to Mike Skells question. Repeating that response: The problem with using instanceof Hashable32 is that is much slower (often more than 25X) than instanceof String. It's slow enough that we won't reasonably consider using instanceof Hashable32 in JDK 8. We have considered making Object implement Hashable32 and add a virtual extension method to Object for hash32(). The extension method would just call hashCode(). A compiler that supports extension methods is not yet part of the JDK mainline repo yet (It is still in the Lambda repo). This approach would mean that we can avoid an instanceof check but there is a *lot* of entirely reasonable reservations about having Object implement an interface and gain a new method. The actually solution which will be used in Java 8 is TBD. Using instanceof String is interim solution.

> Also, this change
> 
> -        return h ^ (h>>>  7) ^ (h>>>  4);
> +        h ^= (h>>>  7) ^ (h>>>  4);
> +
> +        return h;
> 
> will make the compiler generates an additional iload/istore pair.
> While the Jitted code will be the same, it may bother the inlining heuristic.

I will fix this.

> 
> I think the code in get() was explicitly hand-inlined to avoid perf glitch,
> that's why the current code is not
> 
> Entry<K,V>  entry = getEntry(key);
> return null == entry ? null : entry.getValue();
> 
> but may be this is not needed anymore. i don't know.

I did microbenchmarking early in the development of this patch which showed that getForNullKey() was not needed.

> 
> In transfer, we are not in C++, you can write
> 
>  while(e != null)
> 
> instead of
> 
>  while(null != e)
> 
> with no fear :)

Habit. I still accidentally use 'if (e = null)' once every year or so and hate myself when I discover that was the problem. I know that some people hate these "Yoda" style expressions checks. (http://wiert.me/2010/05/25/yoda-conditions-from-stackoverflow-new-programming-jargon-you-coined/)

> I've a kind of meta-question, why you have one HASHMASK_OFFSET by hash map
> implementations instead of one to rule them all.

I am not sure how they can be unified except to put the field into AbstractMap which would be unacceptable.

> Can you explain why you don't need to resize anymore in LinkedHashMap.addEntry(),
> I'm too lazy to figure this out by myself.

The location of the resize check moved slightly.

> In WeakHashMap.Entry, why hash is not declared 'final' anymore ?

Mis-port from JDK 7 version which needs it to be non-final for rehashing. It can be final in the JDK 8 version.

> 
> cheers,
> Rémi
> 
> On 05/23/2012 07:16 AM, Mike Duigou wrote:
>> Dear OpenJDK CoreLibs community,
>> 
>> A significant enhancement to the Java SE hashing-based Map implementations in planned for inclusion in Java SE 7u6. All of the hashing based Map implementations: HashMap, Hashtable, LinkedHashMap, WeakHashMap and ConcurrentHashMap will now use an enhanced hashing algorithm for string keys when the capacity of the hash table has ever grown beyond 512 entries. The enhanced hashing implementation uses the murmur3 hashing algorithm[1] along with random hash seeds and index masks. These enhancements mitigate cases where colliding String hash values could result in a performance bottleneck.
>> 
>> In order to provide the greatest opportunity for developers to test compatibility with their applications this change will be incorporated into JDK7u6 build 12 and JDK8 build 39. Both builds are planned for release next week. ***For 7u6 build 12 only, the alternative hashing will be unconditionally enabled (always on).*** The threshold default will be reset to the intended release default (512) for 7u6 build 13.
>> 
>> The quick promotion of this change into builds with limited opportunity for public review and the special behaviour for build 12 is intended to make it easier for developers to test their application compatibility. Feedback on the approach, implementation, compatibility and performance is eagerly sought and encouraged both before *and after* this change is incorporated into the OpenJDK repositories.
>> 
>> A new system property, jdk.map.althashing.threshold, allows adjustment of the threshold for enabling the enhanced hashing algorithm. If changed from the default value of 512, the enhanced hashing will be invoked any time after map capacity exceeds the value of jdk.map.althashing.threshold. To completely disable the enhanced hashing (not recommended), set jdk.map.althashing.threshold to -1 or a very large number such as 2^31 -1 (Integer.MAX_VALUE).
>> 
>> The iteration order of keys, values and entries for hash-based maps where the new algorithm has been invoked will vary for each HashMap instance. While the Java SE Map documentation makes no promises that iteration order of items returned from Maps will be consistent, developers should check if their applications have incorrectly created a dependency on the iteration order of Map entries, keys or values.
>> 
>> Webrevs for the Java 7u6 and 8 changes are available for download at [2] and [3] for your review. There are some important differences between the Java 7 and 8 implementations of this enhancement. Most specifically in the Java 8 implementation alternative string hashing is always enabled--no threshold is used for enablement and alternative hashing cannot be disabled. (The Java 8 implementation completely ignores the jdk.map.althashing.threshold system property). The Java 8 implementation is also subject to additional refinement as Java 8 develops.
>> 
>> If you have any questions or concerns with this planned enhancement, please use the corelibs development mailing list,<mailto:core-libs-dev at openjdk.java.net>, or you may also respond privately to me if you prefer.
>> 
>> Thanks,
>> 
>> Mike
>> 
>> [1] Murmur3 : https://code.google.com/p/smhasher/wiki/MurmurHash3
>> [2] althashing "7" webrev : http://cr.openjdk.java.net/~mduigou/althashing7/8/webrev/
>> [3] althashing "8" webrev : http://cr.openjdk.java.net/~mduigou/althashing8/8/webrev/
> 




More information about the core-libs-dev mailing list