HashMap collision speed (regression 7->8)

Fri Jan 9 18:31:52 UTC 2015

Hi, Bernd

The initial discussion of the change that set TREEIFY_THRESHOLD to 8 (it 
was initially 16) can be read here:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-July/018685.html

...continuing here...

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-August/019853.html

The code review discussion is archived here:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-August/020131.html

-Brent

On 1/8/15 4:10 PM, Bernd Eckenfels wrote:
> Hello Vitaly,
>
> the TREEIFY_THRESHOLD in HashMap is 8 (6 for UNTREEIFY). Not sure if
> there have been benchmarks for this, I dont see an justification in the
> source comments. There is a comment, that it is expected to waste a
> factor of two time and space when not compareable.
>
> I would also expect that larger thresholds make sense (especially when key is not
> compareable).
>
> The JEP 180 also does not mention results from the benchmarks, maybe
> somebody knows details?
>
> http://openjdk.java.net/jeps/180
>
> Gruss
> Bernd
>
>
> Am Thu, 8 Jan 2015 18:38:53 -0500
> schrieb Vitaly Davidovich <vitalyd at gmail.com>:
>
>> Hmmm, 15 entries doesn't seem like a large enough number to drop
>> linear search.  I'd have expected something like 60-80 entries (in
>> some binary vs linear search benchmarks I've come across, that seems
>> to be crossover point).  It's hard to beat linear search for small
>> sets when the comparison function is dirt cheap.
>>
>> Sent from my phone
>> On Jan 8, 2015 6:25 PM, "Bernd Eckenfels" <ecki at zusammenkunft.net>
>> wrote:
>>
>>> Hello Peter,
>>>
>>> hm not sure what you mean, i was not suggesting the regression is
>>> caused by simpler hashCode bits. (do you mean my comment about the
>>> removed randomizer? that is not a problem for the benchmark, but it
>>> might be an opportunity for DOS (again)).
>>>
>>> I think the regression itself is caused by the fact that a tree is
>>> used instead of the linear search. The benchmark does colide
>>> heavily, but only places 15 or so entries in one bucket. This is
>>> enough to trigger the migration from linear search to tree, but not
>>> enough to hurt for linear search performance (at least I think this
>>> is the case).
>>>
>>> The way the nodes are constructed they actually do sort pretty good
>>> if compareable is implemented. It would most likely not help if
>>> Compareable cannot distinguish more than hashCode would.
>>>
>>> Gruss
>>> Bernd
>>>
>>>
>>>
>>> Am Thu, 08 Jan 2015 23:10:10 +0100
>>> schrieb Peter Levart <peter.levart at gmail.com>:
>>>
>>>> Bernd,
>>>>
>>>> I tried to change the "comparableClassFor" myself and it didn't
>>>> work (HashMap is used very early in boot-up sequence and
>>>> initializing ClassValue at that time triggers a NPE).
>>>>
>>>> Anyway, caching of "comparableClassFor" result would only
>>>> potentially improve the  "badDistWithComp" result. But can not
>>>> improve "badDistNoComp" which is the one with speed regression as
>>>> you're benchmark suggests.
>>>>
>>>> But your feeling that this is caused by "simpler" hashCode bits
>>>> spreading function is not correct. I tried to replace the hash()
>>>> method with the one that was in HM before and I get comparable
>>>> results. This is the JDK8 HashMap.hash() method:
>>>>
>>>>       static final int hash(Object key) {
>>>>           int h;
>>>>           return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>>
>>>> 16); }
>>>>
>>>> Benchmark                               (initialSize)   Mode
>>>> Samples        Score  Score error    Units
>>>> j.t.HashMapCollision.badDistNoComp                 16   avgt 4
>>>> 3171.264     1152.995    ms/op
>>>> j.t.HashMapCollision.badDistWithComp               16   avgt 4
>>>> 2819.342      422.861    ms/op
>>>> j.t.HashMapCollision.goodDistNoComp                16   avgt 4
>>>> 1026.064       72.049    ms/op
>>>> j.t.HashMapCollision.goodDistWithComp              16   avgt 4
>>>> 1025.312       39.858    ms/op
>>>>
>>>>
>>>> ...and this is my re-interpretation of pre JDK8 HashMap.hash():
>>>>
>>>>
>>>>       static final int randomHash =
>>>> mix32(System.currentTimeMillis() ^ System.nanoTime());
>>>>
>>>>       private static int mix32(long z) {
>>>>           z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL;
>>>>           return (int)(((z ^ (z >>> 33)) * 0xc4ceb9fe1a85ec53L) >>>
>>>> 32); }
>>>>
>>>>       static final int hash(Object k) {
>>>>           int h = k == null ? randomHash : randomHash ^
>>>> k.hashCode();
>>>>
>>>>           // This function ensures that hashCodes that differ only
>>>> by // constant multiples at each bit position have a bounded
>>>>           // number of collisions (approximately 8 at default load
>>>> factor). h ^= (h >>> 20) ^ (h >>> 12);
>>>>           return h ^ (h >>> 7) ^ (h >>> 4);
>>>>       }
>>>>
>>>> Benchmark                               (initialSize)   Mode
>>>> Samples        Score  Score error    Units
>>>> j.t.HashMapCollision.badDistNoComp                 16   avgt 4
>>>> 3257.348     1079.088    ms/op
>>>> j.t.HashMapCollision.badDistWithComp               16   avgt 4
>>>> 2866.740      414.687    ms/op
>>>> j.t.HashMapCollision.goodDistNoComp                16   avgt 4
>>>> 1041.068       99.370    ms/op
>>>> j.t.HashMapCollision.goodDistWithComp              16   avgt 4
>>>> 1041.653       53.925    ms/op
>>>>
>>>>
>>>> Your benchmark does not show much difference. Perhaps the
>>>> regression for "badDistNoComp" case could be caused by the fact
>>>> that with really bad hashCode and no Comparable interface, the
>>>> red-black tree becomes less performant to search than a simple
>>>> linked list of Nodes...
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 01/08/2015 08:41 PM, Bernd Eckenfels wrote:
>>>>> Hello Peter,
>>>>>
>>>>> yes it is only keys without an Compareable interface, but they
>>>>> are quite common. I think the main problem with the internal
>>>>> comparator (based on instance identity) is, that it would work
>>>>> for looking up the same instance again, but not for the case
>>>>> where the actual instance is re-created (as in my example code).
>>>>>
>>>>> I would love to test your modified code, but I don't have a
>>>>> OpenJDK build environment handy. Or actually I can try to get
>>>>> one, is there somewhere a Virtulisation or Cloud Image
>>>>> available which is pre-installed?
>>>>>
>>>>> I have (1.6) a compiled benchmark.jar here, in case anyone
>>>>> wants to try it:
>>>>>
>>>>>
>>> https://onedrive.live.com/redir?resid=A98B6F4E09966AFD!20440&authkey=!AFFk03-5jq21Xz0&ithint=file%2cjar
>>>>>
>>>>> BTW: I am (as usual) not expecting commoents on that, but just
>>>>> to mention it: the expected good behavior of degenerated
>>>>> hashmaps (due to the tree) was reason for removing the hashcode
>>>>> secret randomization. I wonder if that was such a good idea if
>>>>> colliding lookups (with more than a handfull of entries in a
>>>>> bucket) have this 50% penalty.
>>>>>
>>>>> Greetings
>>>>> Bernd
>>>>>
>>>>>
>>>>>
>>>>> Am Thu, 08 Jan 2015 20:22:13 +0100
>>>>> schrieb Peter Levart <peter.levart at gmail.com>:
>>>>>
>>>>>> Hi Bernd,
>>>>>>
>>>>>> It seems that only bad hash codes (without comparable keys) in
>>>>>> JDK8 HM are worse than JDK7 HM.
>>>>>>
>>>>>>
>>>>>> Since you have already taken time to measure JDK7 vs JDK8 HM,
>>>>>> could you try to take the JDK8 source and just replace the
>>>>>> internal "comparableClassFor" method with the following
>>>>>> implementation:
>>>>>>
>>>>>>        static final ClassValue<Boolean> selfComparable = new
>>>>>> ClassValue<Boolean>() {
>>>>>>            @Override
>>>>>>            protected Boolean computeValue(Class<?> c) {
>>>>>>                Type[] as; ParameterizedType p;
>>>>>>                for (Type t : c.getGenericInterfaces()) {
>>>>>>                    if (t instanceof ParameterizedType &&
>>>>>>                        (p = (ParameterizedType) t).getRawType()
>>>>>> == Comparable.class &&
>>>>>>                        (as = p.getActualTypeArguments()).length
>>>>>> == 1 && as[0] == c) // type arg is c
>>>>>>                        return true;
>>>>>>                }
>>>>>>                return false;
>>>>>>            }
>>>>>>        };
>>>>>>
>>>>>>        static Class<?> comparableClassFor(Object x) {
>>>>>>            if (x instanceof Comparable) {
>>>>>>                Class<?> c = x.getClass();
>>>>>>                if (c == String.class || selfComparable.get(c)) {
>>>>>>                    return c;
>>>>>>                }
>>>>>>            }
>>>>>>            return null;
>>>>>>        }
>>>>>>
>>>>>>
>>>>>> ...and retry your measurements. I just want to see if it has
>>>>>> any impact.
>>>>>>
>>>>>>
>>>>>> Thanks, Peter
>>>>>>
>>>>>>
>>>>>> On 01/08/2015 05:38 PM, Bernd Eckenfels wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I think it was topic before, but I just wanted to point out,
>>>>>>> that it is still an topic on the internetz. :)
>>>>>>>
>>>>>>> Motivated by a StackOverflow question regarding HashMap
>>>>>>> performance regression in Java 8
>>>>>>>
>>>>>>>
>>> http://stackoverflow.com/questions/27759527/using-java-7-hashmap-in-java-8/27760442
>>>>>>>
>>>>>>> I made a JMH test and compared 7 and 8 speed. (the test is not
>>>>>>> very scientific as I dont really know how to squeeze the
>>>>>>> longrunning loop which alters state into the harness, but the
>>>>>>> results seem to be consitent wth theory and stopwatch testing)
>>>>>>>
>>>>>>> https://gist.github.com/ecki/9f69773eb29428a36077
>>>>>>>
>>>>>>> What can be seen is, that with a good distribution of hash
>>>>>>> keys 8 looks faster than 7, and with a bad distribution of
>>>>>>> hash keys Java 7 is faster (unless you supply a Comparator
>>>>>>> for the key). (and a good distributed hashkey with comparable
>>>>>>> seems to be a bit slower)
>>>>>>>
>>>>>>> I think the regression is somewhat expected, but I guess its
>>>>>>> not widely known.
>>>>>>>
>>>>>>> (I do not use a cached hashcode, but it has a nearly trivial
>>>>>>> implementation just to make it more life like. the tests also
>>>>>>> compares different initial sizes, but they do not have an
>>>>>>> measurable effect on the performance, I show only default size
>>>>>>> below:)
>>>>>>>
>>>>>>> java version "1.7.0_72"
>>>>>>>
>>>>>>> Benchmark                      (initialSize) Mode Samp Score
>>>>>>> Error Units n.e.j.h.HashMapCollision.badDistNoComp 16    avgt
>>>>>>> 4 10847,318 ± 5596,690 ms/op
>>>>>>> n.e.j.h.HashMapCollision.badDistWithComp 16  avgt 4
>>>>>>> 10761,430 ± 5376,975 ms/op
>>>>>>> n.e.j.h.HashMapCollision.goodDistNoComp 16   avgt 4
>>>>>>> 3613,923 ± 254,823 ms/op
>>>>>>> n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4
>>>>>>> 3656,229 ± 274,350 ms/op java version "1.8.0_25"
>>>>>>>
>>>>>>> Benchmark                      (initialSize) Mode Samp Score
>>>>>>> Error Units n.e.j.h.HashMapCollision.badDistNoComp 16    avgt
>>>>>>> 4 14309,880 ± 1811,709 ms/op <-slower
>>>>>>> n.e.j.h.HashMapCollision.badDistWithComp 16  avgt 4
>>>>>>> 8232,037 ± 5974,530 ms/op
>>>>>>> n.e.j.h.HashMapCollision.goodDistNoComp 16   avgt 4
>>>>>>> 3304,698 ± 116,866 ms/op
>>>>>>> n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4
>>>>>>> 3425,762 ± 107,659 ms/op
>>>>>>>
>>>>>>>
>>>>>>> Greetings
>>>>>>> Bernd
>>>>
>>>>
>>>
>>
>