HashMap collision speed (regression 7->8)

Sun Jan 11 19:26:33 UTC 2015

Hi,

Just to illustrate what Bend's benchmark keys are made of so one can 
interpret the results accordingly. The full set of keys consists of 
250,000 distinct keys (500*500). With GOOD_HASH multiplier, each of them 
has it's own distinct hashCode() and a HashMap filled with all 250k keys 
looks like:

                  Capacity: 524288
               Load factor: 0.75
                      Size: 250000
                 Bin sizes: 0*274288 1*250000 total=250000
                Empty bins: 52.3 %
Unique hash codes per bin: 0*274288 1*250000 total=250000

With BAD_HASH multiplier, there are only 18963 unique hashCodes:

                  Capacity: 524288
               Load factor: 0.75
                      Size: 250000
                 Bin sizes: 0*505325 1*74 2*74 3*74 4*74 5*74 6*74 7*74 
8*74 9*74 10*74 11*74 12*74 13*8822 14*9253 total=250000
                Empty bins: 96.4 %
Unique hash codes per bin: 0*505325 1*18963 total=18963

Although majority of entries constitute the bins of size 13 or 14, 
there's only a single hashCode value per bin.

So in this benchmark, treeifying with non-comparable keys is a waste of 
effort.

Are there (non-forged) sets of non-comparable keys with hashCodes where 
treeifying makes sense?

Regards, Peter

P.S. The below modified benchmark source is update here:

http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HashMapCollision.java

The HashMap simulator that prints stats about bins and hashCodes is here:

http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HashMapSimulator.java

On 01/11/2015 04:09 PM, Peter Levart wrote:
> Hi,
>
> I wasn't comfortable with Bernd's HMH benchmark results jitter, so I 
> changed the mode of operation to be SingleShotTime (since a particular 
> invocation is from 0.6 to 3sec anyway). GC is triggered before each 
> invocation (-gc true). I also added -XX:-TieredCompilation VM option 
> and run 6 forks of 10 iterations of each test. By Doug's suggestion I 
> also added a variant of unchanged HashMap where TREEIFY_THRESHOLD = 1 
> << 20, UNTREEIFY_THRESHOLD = TREEIFY_THRESHOLD - 2, 
> MIN_TREEIFY_CAPACITY = TREEIFY_THRESHOLD * 4 as a reference to compare 
> with. Here are the results:
>
> Original JDK9 HashMap:
>
> Benchmark                               (initialSize)   Mode 
> Samples        Score  Score error    Units
> j.t.HashMapCollision.badDistNoComp                 16 ss        60     
> 3011.738       78.249       ms
> j.t.HashMapCollision.badDistWithComp               16 ss        60     
> 2984.280       48.315       ms
> j.t.HashMapCollision.goodDistNoComp                16 ss        
> 60      682.060       52.341       ms
> j.t.HashMapCollision.goodDistWithComp              16 ss        
> 60      685.705       55.183       ms
>
> Original JDK9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>
> Benchmark                               (initialSize)   Mode 
> Samples        Score  Score error    Units
> j.t.HashMapCollision.badDistNoComp                 16 ss        60     
> 2780.771      236.647       ms
> j.t.HashMapCollision.badDistWithComp               16 ss        60     
> 2541.740      233.429       ms
> j.t.HashMapCollision.goodDistNoComp                16 ss        
> 60      757.364       67.869       ms
> j.t.HashMapCollision.goodDistWithComp              16 ss        
> 60      671.617       54.943       ms
>
> Caching of comparableClassFor (in ClassRepository - good for 
> heterogeneous keys too):
>
> Benchmark                               (initialSize)   Mode 
> Samples        Score  Score error    Units
> j.t.HashMapCollision.badDistNoComp                 16 ss        60     
> 3014.888       71.778       ms
> j.t.HashMapCollision.badDistWithComp               16 ss        60     
> 2279.757       54.159       ms
> j.t.HashMapCollision.goodDistNoComp                16 ss        
> 60      760.743       70.674       ms
> j.t.HashMapCollision.goodDistWithComp              16 ss        
> 60      725.188       67.853       ms
>
> Caching of comparableClassFor (internally - good for homogeneous keys 
> only):
>
> Benchmark                               (initialSize)   Mode 
> Samples        Score  Score error    Units
> j.t.HashMapCollision.badDistNoComp                 16 ss        60     
> 3026.707       84.571       ms
> j.t.HashMapCollision.badDistWithComp               16 ss        60     
> 2137.296       66.140       ms
> j.t.HashMapCollision.goodDistNoComp                16 ss        
> 60      635.964        8.213       ms
> j.t.HashMapCollision.goodDistWithComp              16 ss        
> 60      685.129       46.783       ms
>
>
>
> Regards, Peter
>
> On 01/11/2015 12:55 PM, Peter Levart wrote:
>>
>> On 01/11/2015 02:27 AM, Martin Buchholz wrote:
>>> Peter,
>>>
>>> You are adding the ability to add "app-specific storage" to Class 
>>> objects ("Class-local variables"?), which is pretty unusual.
>>
>> Well, that was my intention, since the logic about what should be 
>> cached is very specific to the usecase and might change in the 
>> future. Anyway, this is only internal API. Users have a public 
>> alternative in ClassValue. That's one reason. The other is space 
>> overhead introduced when caching with ClassValue and inability to 
>> initialize ClassValue so very early in the boot-up sequence.
>>
>>>
>>> I was thinking instead of a very dumb 1-element cache, remembering 
>>> Class and comparableClassFor, which will work for typical 
>>> homogeneous HashMaps.
>>
>> This seems like a good idea. We would actually need only one field of 
>> type Class<?> and a boolean flag.
>>
>> Unfortunately, comparableClassFor is a static method used also from 
>> various other contexts that don't have access to HashMap instance, 
>> for example from TreeNode. We would have to extend the internal API 
>> with an additional HashMap argument to pass the HM instance around. 
>> Not to mention that this would be tricky because retaining the last 
>> used comparable Class object in the HM instance could prevent GC from 
>> releasing a ClassLoader in an app server environment for example. A 
>> WeakReference<Class<?>> would have to be used and new WeakReference 
>> object created each time cached value changes. Unless we cache only 
>> the 1st comparableClassFor result and never change it, which has the 
>> same cache-hit ratio for homogeneous keys.
>>
>> Right, here's what this looks like:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/
>>
>> I modified Bernd's JMH benchmark a little to use ThreadLocalRandom 
>> insted of Random, so results express more what is going on with 
>> HashMap and less with Random synchronization:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HashMapCollision.java
>>
>> Results:
>>
>> Original JDK9 HashMap:
>>
>> Benchmark                               (initialSize)   Mode 
>> Samples        Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16 avgt         
>> 6     3101.247      435.866    ms/op
>> j.t.HashMapCollision.badDistWithComp               16 avgt         
>> 6     2410.202      478.247    ms/op
>> j.t.HashMapCollision.goodDistNoComp                16 avgt         
>> 6      615.100        7.063    ms/op
>> j.t.HashMapCollision.goodDistWithComp              16 avgt         
>> 6      614.229      159.558    ms/op
>>
>> Caching of comparableClassFor (in ClassRepository - good for 
>> heterogeneous keys too):
>>
>> Benchmark                               (initialSize)   Mode 
>> Samples        Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16 avgt         
>> 6     3305.967      652.791    ms/op
>> j.t.HashMapCollision.badDistWithComp               16 avgt         
>> 6     2030.965      241.910    ms/op
>> j.t.HashMapCollision.goodDistNoComp                16 avgt         
>> 6      611.202        6.440    ms/op
>> j.t.HashMapCollision.goodDistWithComp              16 avgt         
>> 6      582.890        4.896    ms/op
>>
>> Caching of comparableClassFor (internally - good for homogeneous keys 
>> only):
>>
>> Benchmark                               (initialSize)   Mode 
>> Samples        Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16 avgt         
>> 6     3265.673      660.030    ms/op
>> j.t.HashMapCollision.badDistWithComp               16 avgt         
>> 6     1875.204      224.682    ms/op
>> j.t.HashMapCollision.goodDistNoComp                16 avgt         
>> 6      598.949       25.484    ms/op
>> j.t.HashMapCollision.goodDistWithComp              16 avgt         
>> 6      585.278        8.103    ms/op
>>
>>
>> Regards, Peter
>>
>>>
>>> On Sat, Jan 10, 2015 at 5:01 AM, Peter Levart 
>>> <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>
>>>
>>>     On 01/10/2015 01:20 AM, Doug Lea wrote:
>>>>     On 01/09/2015 06:29 PM, Martin Buchholz wrote:
>>>>>     Given the prevalence of sub-optimal hashcodes, my own
>>>>>     intuition is also that
>>>>>     raising the treeification threshold from 8 will be a win.
>>>>
>>>>     That's what I thought at first. But 8 is a better choice for
>>>>     String
>>>>     and other Comparable keys, which account for the majority of
>>>>     HashMaps
>>>>     out there. (For non-comparables, infinity is the best threshold.)
>>>>     How much slower should we make the most common cases to make
>>>>     the others
>>>>     faster? The only way to decide empirically is to take a large
>>>>     corpus of programs and vary thresholds. Short of that, speeding up
>>>>     comparableClassFor is still the best bet for reducing impact on
>>>>     non-comparables.
>>>
>>>     Hi Doug,
>>>
>>>     comparableClassFor() for non-comparables that don't implement
>>>     Comparable is already as fast as it can be (the 1st check is
>>>     instanceof Comparable). For other comparables (and
>>>     non-comparables) that implement Comparable (except for String
>>>     which is special-cased), we could improve the situation by
>>>     caching the result.
>>>
>>>     Here's another attempt at that. This time it uses plain old JDK1
>>>     stuff, so it actually works even in HashMap (using
>>>     IdentityHashMap so no danger of circular usage if it is to be
>>>     applied to CHM also):
>>>
>>>     http://cr.openjdk.java.net/~plevart/jdk9-dev/Class.getGenericDerivative/webrev.01/
>>>     <http://cr.openjdk.java.net/%7Eplevart/jdk9-dev/Class.getGenericDerivative/webrev.01/>
>>>
>>>     With this patch, the results of Bernd's JMH benchmark do give
>>>     some boost to keys that implement Comparable (badDistWithComp case).
>>>
>>>     These are the results with original JDK9 HashMap:
>>>
>>>     Benchmark (initialSize)   Mode   Samples        Score Score
>>>     error    Units
>>>     j.t.HashMapCollision.badDistNoComp 16   avgt         6    
>>>     3104.047 278.057    ms/op
>>>     j.t.HashMapCollision.badDistWithComp 16   avgt         6    
>>>     2754.499 243.780    ms/op
>>>     j.t.HashMapCollision.goodDistNoComp 16   avgt         6    
>>>     1031.992 26.422    ms/op
>>>     j.t.HashMapCollision.goodDistWithComp 16   avgt         6    
>>>     1082.347 30.981    ms/op
>>>
>>>     And this is with patch applied:
>>>
>>>     Benchmark (initialSize)   Mode   Samples        Score Score
>>>     error    Units
>>>     j.t.HashMapCollision.badDistNoComp 16   avgt         6    
>>>     3081.419 386.125    ms/op
>>>     j.t.HashMapCollision.badDistWithComp 16   avgt         6    
>>>     2116.030 281.160    ms/op
>>>     j.t.HashMapCollision.goodDistNoComp 16   avgt         6    
>>>     1015.224 81.843    ms/op
>>>     j.t.HashMapCollision.goodDistWithComp 16   avgt         6    
>>>     1078.719 38.351    ms/op
>>>
>>>
>>>     Caching is performed as part of Class generic types information
>>>     caching (ClassRepository), so there's no overhead for those that
>>>     don't need generic types information. All logic is kept inside
>>>     (C)HM.
>>>
>>>     Regards, Peter
>>>
>>>>
>>>>     -Doug
>>>>
>>>
>>>
>>
>