Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps

Mike Duigou mike.duigou at oracle.com
Wed May 30 16:23:22 UTC 2012


On May 30 2012, at 07:38 , Rémi Forax wrote:

> On 05/30/2012 01:05 AM, Mike Duigou wrote:
>> Another round of updates for Java 7 [1] and Java 8 [2] implementations. This revision incorporates Remi's suggestions and some feedback from Doug Lea regarding applying the per-instance seed to the result of String.hash32()
>> 
>> [1] althashing "7" webrev : http://cr.openjdk.java.net/~mduigou/althashing7/10/webrev/
>> [2] althashing "8" webrev : http://cr.openjdk.java.net/~mduigou/althashing8/10/webrev/
>> 
>> Barring any emergencies this will be integrated to both Java 7 and Java 8 on Wednesday May 30th, 2012.
>> 
>> Thanks to all who provided feedback!
>> 
>> Mike
> 
> Hi Mike,
> I've still trouble to understand why you want to expose something
> which is not used.
> 
> We know that the implementation of String.hashCode()
> is broken but we can't change it because the implementation
> is part of the spec.
> We can't do an instanceof check on an interface in the implementations
> of Map because the VM implementations (not only Hotspot by the way)
> are not able to transform it to a quick class check.
> String.hash32 is not a polymorphic method but just a method
> used to fix the fact that we can't change the implementation
> of String.hashCode() so there is no need for a interface like Hashable32.
> 
> as a proof, as Ulf said, String even doesn't implement Hashable32.


It's a mistake that String doesn't implement Hashable32 in the java 8 patch. That line somehow got lost with multiple revisions. (most likely due to constant merge conflicts with the String offset/count patch) I will restore it.

The current proposal for Java 8:

- A new interface Hashable32 is introduced.
- Hashable32 provides a method hash32()
- String implements Hashable32 and hash32() method
- HashMap et al recognize String and invoke hash32() rather than hashCode()
- ** End of current implementation **
- When the mainline javac supports extensions methods the implementation will be extended.
- Add extension method to Hashable32.hash32() that calls hashCode()
- Object implements Hashable32  [controversial idea]
- HashMap et al unconditionally call hash32(). Other JDK code may also switch to call hash32() instead of hashCode()

Mike

> Rémi
> 
>> 
>> 
>> On May 25 2012, at 10:38 , Rémi Forax wrote:
>> 
>>> On 05/24/2012 09:34 PM, Mike Duigou wrote:
>>>> Hello All;
>>>> 
>>>> I have updated the webrevs for alternative hashing for String with feedback from Remi, Doug, Ulf and internal reviewers.
>>>> 
>>>> Additional feedback is welcome.
>>>> 
>>>> Mike
>>>> 
>>>> [1] althashing "7" webrev : http://cr.openjdk.java.net/~mduigou/althashing7/9/webrev/
>>>> [2] althashing "8" webrev : http://cr.openjdk.java.net/~mduigou/althashing8/9/webrev/
>>> Hello Mike,
>>> just two small issues.
>>> 
>>> In WeakHashMap.hash(), the null check should be done at callsite,
>> It's not even needed at the callsite as it always follows a maskNull call.
>> 
>>> the call to hash32() should be directly returned like in HashMap.
>> Fixed.
>> 
>>> Is Hashable32 is used anymore ?
>> It's the "source" of the hash32 method that String now implements and hopefully more later. This bit is sort of unfinished in this patch but I would like to leave it in for now.
>> 
>> Mike
>> 
>> 
>>> Rémi
>>> 
>>>> On May 22 2012, at 22:16 , Mike Duigou wrote:
>>>> 
>>>>> Dear OpenJDK CoreLibs community,
>>>>> 
>>>>> A significant enhancement to the Java SE hashing-based Map implementations in planned for inclusion in Java SE 7u6. All of the hashing based Map implementations: HashMap, Hashtable, LinkedHashMap, WeakHashMap and ConcurrentHashMap will now use an enhanced hashing algorithm for string keys when the capacity of the hash table has ever grown beyond 512 entries. The enhanced hashing implementation uses the murmur3 hashing algorithm[1] along with random hash seeds and index masks. These enhancements mitigate cases where colliding String hash values could result in a performance bottleneck.
>>>>> 
>>>>> In order to provide the greatest opportunity for developers to test compatibility with their applications this change will be incorporated into JDK7u6 build 12 and JDK8 build 39. Both builds are planned for release next week. ***For 7u6 build 12 only, the alternative hashing will be unconditionally enabled (always on).*** The threshold default will be reset to the intended release default (512) for 7u6 build 13.
>>>>> 
>>>>> The quick promotion of this change into builds with limited opportunity for public review and the special behaviour for build 12 is intended to make it easier for developers to test their application compatibility. Feedback on the approach, implementation, compatibility and performance is eagerly sought and encouraged both before *and after* this change is incorporated into the OpenJDK repositories.
>>>>> 
>>>>> A new system property, jdk.map.althashing.threshold, allows adjustment of the threshold for enabling the enhanced hashing algorithm. If changed from the default value of 512, the enhanced hashing will be invoked any time after map capacity exceeds the value of jdk.map.althashing.threshold. To completely disable the enhanced hashing (not recommended), set jdk.map.althashing.threshold to -1 or a very large number such as 2^31 -1 (Integer.MAX_VALUE).
>>>>> 
>>>>> The iteration order of keys, values and entries for hash-based maps where the new algorithm has been invoked will vary for each HashMap instance. While the Java SE Map documentation makes no promises that iteration order of items returned from Maps will be consistent, developers should check if their applications have incorrectly created a dependency on the iteration order of Map entries, keys or values.
>>>>> 
>>>>> Webrevs for the Java 7u6 and 8 changes are available for download at [2] and [3] for your review. There are some important differences between the Java 7 and 8 implementations of this enhancement. Most specifically in the Java 8 implementation alternative string hashing is always enabled--no threshold is used for enablement and alternative hashing cannot be disabled. (The Java 8 implementation completely ignores the jdk.map.althashing.threshold system property). The Java 8 implementation is also subject to additional refinement as Java 8 develops.
>>>>> 
>>>>> If you have any questions or concerns with this planned enhancement, please use the corelibs development mailing list,<mailto:core-libs-dev at openjdk.java.net>, or you may also respond privately to me if you prefer.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Mike
>>>>> 
>>>>> [1] Murmur3 : https://code.google.com/p/smhasher/wiki/MurmurHash3
>>>>> [2] althashing "7" webrev : http://cr.openjdk.java.net/~mduigou/althashing7/8/webrev/
>>>>> [3] althashing "8" webrev : http://cr.openjdk.java.net/~mduigou/althashing8/8/webrev/
> 




More information about the core-libs-dev mailing list