HashMap collision speed (regression 7->8)

Peter Levart peter.levart at gmail.com
Tue Jan 13 15:51:01 UTC 2015


On 01/13/2015 12:36 PM, Doug Lea wrote:
> On 01/12/2015 05:12 PM, Peter Levart wrote:
>> Hi,
>>
>> I added results obtained with JDK 8 (FCS and u20) - same machine, 
>> same VM
>> options, just different JDKs:
>
> Thanks very much. It remains a mystery why the original report
> by Bernd showed a 40% difference from jdk7. The approx 10% hit
> seen in these more careful measurements is about what we
> expected as the intended tradeoff for avoiding DOS attacks
> and better performance in more common cases. There are
> still some tweaky improvements we should keep exploring,
> but I don't think there's any need for fundamental changes.
>
> -Doug

Perhaps it was because of different HW. I ran the tests on Intel® Core™ 
i7-2600K CPU with 8 MB of Cache. I wonder what Bernd used. The number of 
different keys is 250.000, they are accessed randomly in a loop. Such 
HashMap consists of majority of TreeNode(s). TreeNode(s) have 9 
reference fields + int and boolean. I don't claim that such HashMap fits 
intirely into Cache, but half of it perhaps does. Maybe it is just that 
cache-hit ratio was larger on my HW than on Bernd's and that Bernd's 
cache was thrashing more than mine. With JDK 7 (normal Nodes have just 3 
reference fields + int) such HashMap has roughly half the memory footprint.

Bernd, what did you run your benchmark on?


Peter

>
>>
>> Original JDK 7 HashMap (and JVM):
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 2839.458      157.299       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2673.924      187.063       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 686.972       32.928       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 631.001        6.574       ms
>>
>> Original JDK 8 HashMap (JDK 8 FCS JVM):
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 3186.305       74.890       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2479.155      136.924       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 673.819       13.236       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 673.636        8.676       ms
>>
>> Original JDK 8 HashMap (JDK 8u20 JVM):
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 3107.455       72.524       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2986.006        9.796       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 631.295        7.281       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 641.041       17.139       ms
>>
>> Original JDK 9 HashMap:
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 3011.738       78.249       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2984.280       48.315       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 682.060       52.341       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 685.705       55.183       ms
>>
>> Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 2780.771      236.647       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2541.740      233.429       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 757.364       67.869       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 671.617       54.943       ms
>>
>> Caching of comparableClassFor (in ClassRepository - good for 
>> heterogeneous keys
>> too):
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 3014.888       71.778       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2279.757       54.159       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 760.743       70.674       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 725.188       67.853       ms
>>
>> Caching of comparableClassFor (internally - good for homogeneous keys 
>> only):
>>
>> Benchmark                               (initialSize)   Mode Samples
>> Score  Score error    Units
>> j.t.HashMapCollision.badDistNoComp                 16     ss 60
>> 3026.707       84.571       ms
>> j.t.HashMapCollision.badDistWithComp               16     ss 60
>> 2137.296       66.140       ms
>> j.t.HashMapCollision.goodDistNoComp                16     ss 60
>> 635.964        8.213       ms
>> j.t.HashMapCollision.goodDistWithComp              16     ss 60
>> 685.129       46.783       ms
>>
>>
>> Peter
>>
>> On 01/12/2015 12:26 AM, Peter Levart wrote:
>>>
>>> On 01/11/2015 10:00 PM, Doug Lea wrote:
>>>> On 01/11/2015 02:26 PM, Peter Levart wrote:
>>>>
>>>>> Although majority of entries constitute the bins of size 13 or 14, 
>>>>> there's only
>>>>> a single hashCode value per bin.
>>>>>
>>>>> So in this benchmark, treeifying with non-comparable keys is a 
>>>>> waste of effort.
>>>>
>>>> On the other hand, the waste seems to only cost about 10% in your 
>>>> runs.
>>>> I wonder why the original report using jdk7 vs jdk8 seemed larger.
>>>
>>> I don't know. I ran the same benchmark with same VM options on JDK 7 
>>> too. Here
>>> are all results together:
>>>
>>> Original JDK 7 HashMap (and JVM):
>>>
>>> Benchmark                               (initialSize)   Mode Samples
>>> Score  Score error    Units
>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>> 2839.458      157.299       ms
>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>> 2673.924      187.063       ms
>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>> 686.972       32.928       ms
>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>> 631.001        6.574       ms
>>>
>>> Original JDK 9 HashMap:
>>>
>>> Benchmark                               (initialSize)   Mode Samples
>>> Score  Score error    Units
>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>> 3011.738       78.249       ms
>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>> 2984.280       48.315       ms
>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>> 682.060       52.341       ms
>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>> 685.705       55.183       ms
>>>
>>> Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>>
>>> Benchmark                               (initialSize)   Mode Samples
>>> Score  Score error    Units
>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>> 2780.771      236.647       ms
>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>> 2541.740      233.429       ms
>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>> 757.364       67.869       ms
>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>> 671.617       54.943       ms
>>>
>>> Caching of comparableClassFor (in ClassRepository - good for 
>>> heterogeneous
>>> keys too):
>>>
>>> Benchmark                               (initialSize)   Mode Samples
>>> Score  Score error    Units
>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>> 3014.888       71.778       ms
>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>> 2279.757       54.159       ms
>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>> 760.743       70.674       ms
>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>> 725.188       67.853       ms
>>>
>>> Caching of comparableClassFor (internally - good for homogeneous 
>>> keys only):
>>>
>>> Benchmark                               (initialSize)   Mode Samples
>>> Score  Score error    Units
>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>> 3026.707       84.571       ms
>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>> 2137.296       66.140       ms
>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>> 635.964        8.213       ms
>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>> 685.129       46.783       ms
>>>
>>>>
>>>>>
>>>>> Are there (non-forged) sets of non-comparable keys with hashCodes 
>>>>> where
>>>>> treeifying makes sense?
>>>>
>>>> Try using a class like:
>>>>   class FHC { float f; int hashCode() { return 
>>>> Float.floatToIntBits(f); } }
>>>> and populate with instances with integral values for f.
>>>>
>>>> Similarly for doubles.
>>>>
>>>> Pre-jdk8, we devised a bit-smearing function that (among other
>>>> constraints) did OK for float/double keys with integral values,
>>>> that are not all that rare.  With treeification, we don't need to
>>>> penalize classes with decent hashCodes by bit-smearing to still
>>>> get OK performance for these kinds of cases where the tree-based
>>>> hashCode comparison helps more than Comparability per se.
>>>
>>> I see. These keys actually have unique or near unique hashCodes but 
>>> which are
>>> not good for power of two length tables without bit-smearing. With 
>>> tree bins
>>> we don't need heavy bit-smearing to get decent performance in speed, 
>>> but the
>>> table gets quite sparse anyway (although this is the smaller of the 
>>> space
>>> overheads - tree nodes are bigger). For example, for 1M integral 
>>> Floats, we
>>> get the following:
>>>
>>> >>> Float ...
>>>                  Capacity: 2097152
>>>               Load factor: 0.75
>>>                      Size: 1000000
>>>                 Bin sizes: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0 
>>> 7*0 8*0
>>> 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
>>>                Empty bins: 93.8 %
>>> Unique hash codes per bin: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0 
>>> 7*0 8*0
>>> 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
>>>
>>>
>>>>
>>>> Also...
>>>>
>>>> It looks like the simplest path to a minor improvement is
>>>> just to cache internally (your fourth test below). But I now
>>>> recall not doing this because it adds to footprint and
>>>> the field could prevent class unloading, for only a small
>>>> benefit.
>>>
>>> Footprint, yes (one reference field in HM instance), while class 
>>> unloading is
>>> taken care of using WeakReference:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/ 
>>>
>>>
>>>>
>>>> (Every time HashMap has changed, there have been reports of
>>>> performance regressions even though typical performance
>>>> generally improves.)
>>>>
>>>> -Doug
>>>
>>> Regards, Peter
>>>
>>>>
>>>>
>>>>>> Original JDK9 HashMap:
>>>>>>
>>>>>> Benchmark                               (initialSize) Mode Samples
>>>>>> Score  Score error    Units
>>>>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>>>>> 3011.738       78.249       ms
>>>>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>>>>> 2984.280       48.315       ms
>>>>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>>>>> 682.060       52.341       ms
>>>>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>>>>> 685.705       55.183       ms
>>>>>>
>>>>>> Original JDK9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>>>>>
>>>>>> Benchmark                               (initialSize) Mode Samples
>>>>>> Score  Score error    Units
>>>>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>>>>> 2780.771      236.647       ms
>>>>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>>>>> 2541.740      233.429       ms
>>>>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>>>>> 757.364       67.869       ms
>>>>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>>>>> 671.617       54.943       ms
>>>>>>
>>>>>> Caching of comparableClassFor (in ClassRepository - good for 
>>>>>> heterogeneous
>>>>>> keys too):
>>>>>>
>>>>>> Benchmark                               (initialSize) Mode Samples
>>>>>> Score  Score error    Units
>>>>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>>>>> 3014.888       71.778       ms
>>>>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>>>>> 2279.757       54.159       ms
>>>>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>>>>> 760.743       70.674       ms
>>>>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>>>>> 725.188       67.853       ms
>>>>>>
>>>>>> Caching of comparableClassFor (internally - good for homogeneous 
>>>>>> keys only):
>>>>>>
>>>>>> Benchmark                               (initialSize) Mode Samples
>>>>>> Score  Score error    Units
>>>>>> j.t.HashMapCollision.badDistNoComp                 16 ss        60
>>>>>> 3026.707       84.571       ms
>>>>>> j.t.HashMapCollision.badDistWithComp               16 ss        60
>>>>>> 2137.296       66.140       ms
>>>>>> j.t.HashMapCollision.goodDistNoComp                16 ss        60
>>>>>> 635.964        8.213       ms
>>>>>> j.t.HashMapCollision.goodDistWithComp              16 ss        60
>>>>>> 685.129       46.783       ms
>>>>>>
>>>>>>
>>>>
>>>
>>
>




More information about the core-libs-dev mailing list