HashMap collision speed (regression 7->8)

Thu Jan 8 19:49:22 UTC 2015

On Thursday 08 January 2015 20:41:20 Bernd Eckenfels wrote:
> Hello Peter,
> 
> yes it is only keys without an Compareable interface, but they are
> quite common. I think the main problem with the internal comparator
> (based on instance identity) is, that it would work for looking up the
> same instance again, but not for the case where the actual instance is
> re-created (as in my example code).
> 
> I would love to test your modified code, but I don't have a
> OpenJDK build environment handy. Or actually I can try to get one, is
> there somewhere a Virtulisation or Cloud Image available which is
> pre-installed?

It should not be a problem. Create a custom class names HashMap in custom 
package named java.util, and copy all the source from original HashMap into 
your implementation. Custom one should be earlier in the classpath than the 
one from rt.jar, but if not - just repack rt.jar with your implementation. 
That should do the trick.

> I have (1.6) a compiled benchmark.jar here, in case anyone wants to try
> it:
> 
> https://onedrive.live.com/redir?resid=A98B6F4E09966AFD!20440&authkey=!AFFk03
> -5jq21Xz0&ithint=file%2cjar
> 
> BTW: I am (as usual) not expecting commoents on that, but just to
> mention it: the expected good behavior of degenerated hashmaps (due to
> the tree) was reason for removing the hashcode secret randomization. I
> wonder if that was such a good idea if colliding lookups (with more
> than a handfull of entries in a bucket) have this 50% penalty.
> 
> Greetings
> Bernd
> 
> 
> 
> Am Thu, 08 Jan 2015 20:22:13 +0100
> 
> schrieb Peter Levart <peter.levart at gmail.com>:
> > Hi Bernd,
> > 
> > It seems that only bad hash codes (without comparable keys) in JDK8
> > HM are worse than JDK7 HM.
> > 
> > 
> > Since you have already taken time to measure JDK7 vs JDK8 HM, could
> > you try to take the JDK8 source and just replace the internal
> > 
> > "comparableClassFor" method with the following implementation:
> >      static final ClassValue<Boolean> selfComparable = new
> > 
> > ClassValue<Boolean>() {
> > 
> >          @Override
> >          protected Boolean computeValue(Class<?> c) {
> >          
> >              Type[] as; ParameterizedType p;
> >              for (Type t : c.getGenericInterfaces()) {
> >              
> >                  if (t instanceof ParameterizedType &&
> >                  
> >                      (p = (ParameterizedType) t).getRawType() ==
> > 
> > Comparable.class &&
> > 
> >                      (as = p.getActualTypeArguments()).length == 1 &&
> >                      as[0] == c) // type arg is c
> >                      return true;
> >              
> >              }
> >              return false;
> >          
> >          }
> >      
> >      };
> >      
> >      static Class<?> comparableClassFor(Object x) {
> >      
> >          if (x instanceof Comparable) {
> >          
> >              Class<?> c = x.getClass();
> >              if (c == String.class || selfComparable.get(c)) {
> >              
> >                  return c;
> >              
> >              }
> >          
> >          }
> >          return null;
> >      
> >      }
> > 
> > ...and retry your measurements. I just want to see if it has any
> > impact.
> > 
> > 
> > Thanks, Peter
> > 
> > On 01/08/2015 05:38 PM, Bernd Eckenfels wrote:
> > > Hello,
> > > 
> > > I think it was topic before, but I just wanted to point out, that
> > > it is still an topic on the internetz. :)
> > > 
> > > Motivated by a StackOverflow question regarding HashMap performance
> > > regression in Java 8
> > > 
> > > http://stackoverflow.com/questions/27759527/using-java-7-hashmap-in-java
> > > -8/27760442
> > > 
> > > I made a JMH test and compared 7 and 8 speed. (the test is not very
> > > scientific as I dont really know how to squeeze the longrunning
> > > loop which alters state into the harness, but the results seem to
> > > be consitent wth theory and stopwatch testing)
> > > 
> > > https://gist.github.com/ecki/9f69773eb29428a36077
> > > 
> > > What can be seen is, that with a good distribution of hash keys 8
> > > looks faster than 7, and with a bad distribution of hash keys Java
> > > 7 is faster (unless you supply a Comparator for the key). (and a
> > > good distributed hashkey with comparable seems to be a bit slower)
> > > 
> > > I think the regression is somewhat expected, but I guess its not
> > > widely known.
> > > 
> > > (I do not use a cached hashcode, but it has a nearly trivial
> > > implementation just to make it more life like. the tests also
> > > compares different initial sizes, but they do not have an
> > > measurable effect on the performance, I show only default size
> > > below:)
> > > 
> > > java version "1.7.0_72"
> > > 
> > > Benchmark                      (initialSize) Mode Samp Score
> > > Error Units n.e.j.h.HashMapCollision.badDistNoComp 16    avgt 4
> > > 10847,318 ± 5596,690 ms/op n.e.j.h.HashMapCollision.badDistWithComp
> > > 16  avgt 4   10761,430 ± 5376,975 ms/op
> > > n.e.j.h.HashMapCollision.goodDistNoComp 16   avgt 4    3613,923 ±
> > > 254,823 ms/op n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt
> > > 4    3656,229 ± 274,350 ms/op
> > > java version "1.8.0_25"
> > > 
> > > Benchmark                      (initialSize) Mode Samp Score
> > > Error Units n.e.j.h.HashMapCollision.badDistNoComp 16    avgt 4
> > > 14309,880 ± 1811,709 ms/op <-slower
> > > n.e.j.h.HashMapCollision.badDistWithComp 16  avgt 4     8232,037 ±
> > > 5974,530 ms/op n.e.j.h.HashMapCollision.goodDistNoComp 16   avgt
> > > 4     3304,698 ± 116,866 ms/op
> > > n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4     3425,762 ±
> > > 107,659 ms/op
> > > 
> > > 
> > > Greetings
> > > Bernd

-- 
Regards,
    Stas