HashMap collision speed (regression 7->8)
Peter Levart
peter.levart at gmail.com
Tue Jan 13 15:51:01 UTC 2015
On 01/13/2015 12:36 PM, Doug Lea wrote:
> On 01/12/2015 05:12 PM, Peter Levart wrote:
>> Hi,
>>
>> I added results obtained with JDK 8 (FCS and u20) - same machine,
>> same VM
>> options, just different JDKs:
>
> Thanks very much. It remains a mystery why the original report
> by Bernd showed a 40% difference from jdk7. The approx 10% hit
> seen in these more careful measurements is about what we
> expected as the intended tradeoff for avoiding DOS attacks
> and better performance in more common cases. There are
> still some tweaky improvements we should keep exploring,
> but I don't think there's any need for fundamental changes.
>
> -Doug
Perhaps it was because of different HW. I ran the tests on Intel® Core™
i7-2600K CPU with 8 MB of Cache. I wonder what Bernd used. The number of
different keys is 250.000, they are accessed randomly in a loop. Such
HashMap consists of majority of TreeNode(s). TreeNode(s) have 9
reference fields + int and boolean. I don't claim that such HashMap fits
intirely into Cache, but half of it perhaps does. Maybe it is just that
cache-hit ratio was larger on my HW than on Bernd's and that Bernd's
cache was thrashing more than mine. With JDK 7 (normal Nodes have just 3
reference fields + int) such HashMap has roughly half the memory footprint.
Bernd, what did you run your benchmark on?
Peter
>
>>
>> Original JDK 7 HashMap (and JVM):
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 2839.458 157.299 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2673.924 187.063 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 686.972 32.928 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 631.001 6.574 ms
>>
>> Original JDK 8 HashMap (JDK 8 FCS JVM):
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 3186.305 74.890 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2479.155 136.924 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 673.819 13.236 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 673.636 8.676 ms
>>
>> Original JDK 8 HashMap (JDK 8u20 JVM):
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 3107.455 72.524 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2986.006 9.796 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 631.295 7.281 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 641.041 17.139 ms
>>
>> Original JDK 9 HashMap:
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 3011.738 78.249 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2984.280 48.315 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 682.060 52.341 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 685.705 55.183 ms
>>
>> Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 2780.771 236.647 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2541.740 233.429 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 757.364 67.869 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 671.617 54.943 ms
>>
>> Caching of comparableClassFor (in ClassRepository - good for
>> heterogeneous keys
>> too):
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 3014.888 71.778 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2279.757 54.159 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 760.743 70.674 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 725.188 67.853 ms
>>
>> Caching of comparableClassFor (internally - good for homogeneous keys
>> only):
>>
>> Benchmark (initialSize) Mode Samples
>> Score Score error Units
>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>> 3026.707 84.571 ms
>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>> 2137.296 66.140 ms
>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>> 635.964 8.213 ms
>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>> 685.129 46.783 ms
>>
>>
>> Peter
>>
>> On 01/12/2015 12:26 AM, Peter Levart wrote:
>>>
>>> On 01/11/2015 10:00 PM, Doug Lea wrote:
>>>> On 01/11/2015 02:26 PM, Peter Levart wrote:
>>>>
>>>>> Although majority of entries constitute the bins of size 13 or 14,
>>>>> there's only
>>>>> a single hashCode value per bin.
>>>>>
>>>>> So in this benchmark, treeifying with non-comparable keys is a
>>>>> waste of effort.
>>>>
>>>> On the other hand, the waste seems to only cost about 10% in your
>>>> runs.
>>>> I wonder why the original report using jdk7 vs jdk8 seemed larger.
>>>
>>> I don't know. I ran the same benchmark with same VM options on JDK 7
>>> too. Here
>>> are all results together:
>>>
>>> Original JDK 7 HashMap (and JVM):
>>>
>>> Benchmark (initialSize) Mode Samples
>>> Score Score error Units
>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>> 2839.458 157.299 ms
>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>> 2673.924 187.063 ms
>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>> 686.972 32.928 ms
>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>> 631.001 6.574 ms
>>>
>>> Original JDK 9 HashMap:
>>>
>>> Benchmark (initialSize) Mode Samples
>>> Score Score error Units
>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>> 3011.738 78.249 ms
>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>> 2984.280 48.315 ms
>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>> 682.060 52.341 ms
>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>> 685.705 55.183 ms
>>>
>>> Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>>
>>> Benchmark (initialSize) Mode Samples
>>> Score Score error Units
>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>> 2780.771 236.647 ms
>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>> 2541.740 233.429 ms
>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>> 757.364 67.869 ms
>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>> 671.617 54.943 ms
>>>
>>> Caching of comparableClassFor (in ClassRepository - good for
>>> heterogeneous
>>> keys too):
>>>
>>> Benchmark (initialSize) Mode Samples
>>> Score Score error Units
>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>> 3014.888 71.778 ms
>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>> 2279.757 54.159 ms
>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>> 760.743 70.674 ms
>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>> 725.188 67.853 ms
>>>
>>> Caching of comparableClassFor (internally - good for homogeneous
>>> keys only):
>>>
>>> Benchmark (initialSize) Mode Samples
>>> Score Score error Units
>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>> 3026.707 84.571 ms
>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>> 2137.296 66.140 ms
>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>> 635.964 8.213 ms
>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>> 685.129 46.783 ms
>>>
>>>>
>>>>>
>>>>> Are there (non-forged) sets of non-comparable keys with hashCodes
>>>>> where
>>>>> treeifying makes sense?
>>>>
>>>> Try using a class like:
>>>> class FHC { float f; int hashCode() { return
>>>> Float.floatToIntBits(f); } }
>>>> and populate with instances with integral values for f.
>>>>
>>>> Similarly for doubles.
>>>>
>>>> Pre-jdk8, we devised a bit-smearing function that (among other
>>>> constraints) did OK for float/double keys with integral values,
>>>> that are not all that rare. With treeification, we don't need to
>>>> penalize classes with decent hashCodes by bit-smearing to still
>>>> get OK performance for these kinds of cases where the tree-based
>>>> hashCode comparison helps more than Comparability per se.
>>>
>>> I see. These keys actually have unique or near unique hashCodes but
>>> which are
>>> not good for power of two length tables without bit-smearing. With
>>> tree bins
>>> we don't need heavy bit-smearing to get decent performance in speed,
>>> but the
>>> table gets quite sparse anyway (although this is the smaller of the
>>> space
>>> overheads - tree nodes are bigger). For example, for 1M integral
>>> Floats, we
>>> get the following:
>>>
>>> >>> Float ...
>>> Capacity: 2097152
>>> Load factor: 0.75
>>> Size: 1000000
>>> Bin sizes: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0
>>> 7*0 8*0
>>> 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
>>> Empty bins: 93.8 %
>>> Unique hash codes per bin: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0
>>> 7*0 8*0
>>> 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
>>>
>>>
>>>>
>>>> Also...
>>>>
>>>> It looks like the simplest path to a minor improvement is
>>>> just to cache internally (your fourth test below). But I now
>>>> recall not doing this because it adds to footprint and
>>>> the field could prevent class unloading, for only a small
>>>> benefit.
>>>
>>> Footprint, yes (one reference field in HM instance), while class
>>> unloading is
>>> taken care of using WeakReference:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/
>>>
>>>
>>>>
>>>> (Every time HashMap has changed, there have been reports of
>>>> performance regressions even though typical performance
>>>> generally improves.)
>>>>
>>>> -Doug
>>>
>>> Regards, Peter
>>>
>>>>
>>>>
>>>>>> Original JDK9 HashMap:
>>>>>>
>>>>>> Benchmark (initialSize) Mode Samples
>>>>>> Score Score error Units
>>>>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>>>>> 3011.738 78.249 ms
>>>>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>>>>> 2984.280 48.315 ms
>>>>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>>>>> 682.060 52.341 ms
>>>>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>>>>> 685.705 55.183 ms
>>>>>>
>>>>>> Original JDK9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
>>>>>>
>>>>>> Benchmark (initialSize) Mode Samples
>>>>>> Score Score error Units
>>>>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>>>>> 2780.771 236.647 ms
>>>>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>>>>> 2541.740 233.429 ms
>>>>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>>>>> 757.364 67.869 ms
>>>>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>>>>> 671.617 54.943 ms
>>>>>>
>>>>>> Caching of comparableClassFor (in ClassRepository - good for
>>>>>> heterogeneous
>>>>>> keys too):
>>>>>>
>>>>>> Benchmark (initialSize) Mode Samples
>>>>>> Score Score error Units
>>>>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>>>>> 3014.888 71.778 ms
>>>>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>>>>> 2279.757 54.159 ms
>>>>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>>>>> 760.743 70.674 ms
>>>>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>>>>> 725.188 67.853 ms
>>>>>>
>>>>>> Caching of comparableClassFor (internally - good for homogeneous
>>>>>> keys only):
>>>>>>
>>>>>> Benchmark (initialSize) Mode Samples
>>>>>> Score Score error Units
>>>>>> j.t.HashMapCollision.badDistNoComp 16 ss 60
>>>>>> 3026.707 84.571 ms
>>>>>> j.t.HashMapCollision.badDistWithComp 16 ss 60
>>>>>> 2137.296 66.140 ms
>>>>>> j.t.HashMapCollision.goodDistNoComp 16 ss 60
>>>>>> 635.964 8.213 ms
>>>>>> j.t.HashMapCollision.goodDistWithComp 16 ss 60
>>>>>> 685.129 46.783 ms
>>>>>>
>>>>>>
>>>>
>>>
>>
>
More information about the core-libs-dev
mailing list