HashMap collision speed (regression 7->8)

Peter Levart peter.levart at gmail.com
Sun Jan 11 11:55:48 UTC 2015


On 01/11/2015 02:27 AM, Martin Buchholz wrote:
> Peter,
>
> You are adding the ability to add "app-specific storage" to Class 
> objects ("Class-local variables"?), which is pretty unusual.

Well, that was my intention, since the logic about what should be cached 
is very specific to the usecase and might change in the future. Anyway, 
this is only internal API. Users have a public alternative in 
ClassValue. That's one reason. The other is space overhead introduced 
when caching with ClassValue and inability to initialize ClassValue so 
very early in the boot-up sequence.

>
> I was thinking instead of a very dumb 1-element cache, remembering 
> Class and comparableClassFor, which will work for typical homogeneous 
> HashMaps.

This seems like a good idea. We would actually need only one field of 
type Class<?> and a boolean flag.

Unfortunately, comparableClassFor is a static method used also from 
various other contexts that don't have access to HashMap instance, for 
example from TreeNode. We would have to extend the internal API with an 
additional HashMap argument to pass the HM instance around. Not to 
mention that this would be tricky because retaining the last used 
comparable Class object in the HM instance could prevent GC from 
releasing a ClassLoader in an app server environment for example. A 
WeakReference<Class<?>> would have to be used and new WeakReference 
object created each time cached value changes. Unless we cache only the 
1st comparableClassFor result and never change it, which has the same 
cache-hit ratio for homogeneous keys.

Right, here's what this looks like:

http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/

I modified Bernd's JMH benchmark a little to use ThreadLocalRandom 
insted of Random, so results express more what is going on with HashMap 
and less with Random synchronization:

http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HashMapCollision.java

Results:

Original JDK9 HashMap:

Benchmark                               (initialSize)   Mode 
Samples        Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16   avgt 6     
3101.247      435.866    ms/op
j.t.HashMapCollision.badDistWithComp               16   avgt 6     
2410.202      478.247    ms/op
j.t.HashMapCollision.goodDistNoComp                16   avgt 6      
615.100        7.063    ms/op
j.t.HashMapCollision.goodDistWithComp              16   avgt 6      
614.229      159.558    ms/op

Caching of comparableClassFor (in ClassRepository - good for 
heterogeneous keys too):

Benchmark                               (initialSize)   Mode 
Samples        Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16   avgt 6     
3305.967      652.791    ms/op
j.t.HashMapCollision.badDistWithComp               16   avgt 6     
2030.965      241.910    ms/op
j.t.HashMapCollision.goodDistNoComp                16   avgt 6      
611.202        6.440    ms/op
j.t.HashMapCollision.goodDistWithComp              16   avgt 6      
582.890        4.896    ms/op

Caching of comparableClassFor (internally - good for homogeneous keys only):

Benchmark                               (initialSize)   Mode 
Samples        Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16   avgt 6     
3265.673      660.030    ms/op
j.t.HashMapCollision.badDistWithComp               16   avgt 6     
1875.204      224.682    ms/op
j.t.HashMapCollision.goodDistNoComp                16   avgt 6      
598.949       25.484    ms/op
j.t.HashMapCollision.goodDistWithComp              16   avgt 6      
585.278        8.103    ms/op


Regards, Peter

>
> On Sat, Jan 10, 2015 at 5:01 AM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>
>     On 01/10/2015 01:20 AM, Doug Lea wrote:
>>     On 01/09/2015 06:29 PM, Martin Buchholz wrote:
>>>     Given the prevalence of sub-optimal hashcodes, my own intuition
>>>     is also that
>>>     raising the treeification threshold from 8 will be a win.
>>
>>     That's what I thought at first. But 8 is a better choice for String
>>     and other Comparable keys, which account for the majority of
>>     HashMaps
>>     out there. (For non-comparables, infinity is the best threshold.)
>>     How much slower should we make the most common cases to make the
>>     others
>>     faster? The only way to decide empirically is to take a large
>>     corpus of programs and vary thresholds. Short of that, speeding up
>>     comparableClassFor is still the best bet for reducing impact on
>>     non-comparables.
>
>     Hi Doug,
>
>     comparableClassFor() for non-comparables that don't implement
>     Comparable is already as fast as it can be (the 1st check is
>     instanceof Comparable). For other comparables (and
>     non-comparables) that implement Comparable (except for String
>     which is special-cased), we could improve the situation by caching
>     the result.
>
>     Here's another attempt at that. This time it uses plain old JDK1
>     stuff, so it actually works even in HashMap (using IdentityHashMap
>     so no danger of circular usage if it is to be applied to CHM also):
>
>     http://cr.openjdk.java.net/~plevart/jdk9-dev/Class.getGenericDerivative/webrev.01/
>     <http://cr.openjdk.java.net/%7Eplevart/jdk9-dev/Class.getGenericDerivative/webrev.01/>
>
>     With this patch, the results of Bernd's JMH benchmark do give some
>     boost to keys that implement Comparable (badDistWithComp case).
>
>     These are the results with original JDK9 HashMap:
>
>     Benchmark (initialSize)   Mode   Samples        Score Score
>     error    Units
>     j.t.HashMapCollision.badDistNoComp 16   avgt         6    
>     3104.047      278.057 ms/op
>     j.t.HashMapCollision.badDistWithComp 16   avgt         6    
>     2754.499      243.780 ms/op
>     j.t.HashMapCollision.goodDistNoComp 16   avgt         6    
>     1031.992       26.422 ms/op
>     j.t.HashMapCollision.goodDistWithComp 16   avgt         6    
>     1082.347       30.981 ms/op
>
>     And this is with patch applied:
>
>     Benchmark (initialSize)   Mode   Samples        Score Score
>     error    Units
>     j.t.HashMapCollision.badDistNoComp 16   avgt         6    
>     3081.419      386.125 ms/op
>     j.t.HashMapCollision.badDistWithComp 16   avgt         6    
>     2116.030      281.160 ms/op
>     j.t.HashMapCollision.goodDistNoComp 16   avgt         6    
>     1015.224       81.843 ms/op
>     j.t.HashMapCollision.goodDistWithComp 16   avgt         6    
>     1078.719       38.351 ms/op
>
>
>     Caching is performed as part of Class generic types information
>     caching (ClassRepository), so there's no overhead for those that
>     don't need generic types information. All logic is kept inside (C)HM.
>
>     Regards, Peter
>
>>
>>     -Doug
>>
>
>




More information about the core-libs-dev mailing list