performance surprise with Object.hashCode()

Mon May 13 10:45:38 PDT 2013

You should use System.nanoTime instead of currentTimeMillis.

In your 2nd run, you're going to either get a branch mispredict when it
switches from Integer to Object (shouldn't be big contributor though) or
you're possibly causing JIT to do type checks on each iteration to see
which hashCode to call (assuming there's an inline cache installed).

Why don't you make the array typed Object but fill it only with Integers?
Why are you mixing in two types?

Sent from my phone
On May 13, 2013 1:30 PM, "Andy Nuss" <andrew_nuss at yahoo.com> wrote:

> Here's my benchmarking code.  I don't see at all why a virtual call to
> Integer.hashCode is so much slower than the base line loop skeleton +
> inline access to member value of Integer (3rd case) + 0.5 nanos of vtable
> call (measured in a separate benchmark).
>
> To me, this indicates a serious design flaw in so-called intrinsic call on
> modern fast machines that could be corrected (were it possible) by making
> hashCode() a normal non-native function that forwards to
> System.identityHashCode().
>
> ...
>
> package test;
>
> public class Test {
>
>     private static final int        LEN = 64;
>     private static final int        HALFLEN = LEN/2;
>     private static final int        MASK = HALFLEN-1;
>     private static final long        TENGIG = 10000000000L;
>
>     private static int sum (Object[] ar, long cnt, int mask)
>     {
>         int sum = 0;
>         for (long i = 0; i < cnt; i++)
>             sum += ar[(int)i & mask].hashCode();
>         return sum;
>     }
>
>     private static int sum (Integer[] ar, long cnt, int mask)
>     {
>         int sum = 0;
>         for (long i = 0; i < cnt; i++)
>             sum += ar[(int)i & mask].hashCode();
>         return sum;
>     }
>
>     public static void main (String[] args)
>     {
>         Object[] ar1 = new Object[LEN];
>         for (int i = 0; i < LEN; i++)
>             ar1[i] = new Object();
>         Object[] ar2 = new Object[LEN];
>         for (int i = 0; i < LEN; i++)
>             ar2[i] = i < HALFLEN ? new Integer(i) : new Object();
>         Integer[] ar3 = new Integer[LEN];
>         for (int i = 0; i < LEN; i++)
>             ar3[i] = new Integer(i);
>
>         long m1, m2;
>         int sum = 0;
>
>         for (int i = 0; i < 10000; i++)
>             sum += sum(ar1, 10000, MASK);
>         m1 = System.currentTimeMillis();
>         sum += sum(ar1, TENGIG, MASK);
>         m2 = System.currentTimeMillis();
>         System.out.println("Object.hashCode() " + (m2-m1));
>
>         for (int i = 0; i < 10000; i++)
>             sum += sum(ar2, 10000, MASK);
>         m1 = System.currentTimeMillis();
>         sum += sum(ar2, TENGIG, MASK);
>         m2 = System.currentTimeMillis();
>         System.out.println("vtable using Integer.hashCode() " + (m2-m1));
>
>         for (int i = 0; i < 10000; i++)
>             sum += sum(ar3, 10000, MASK);
>         m1 = System.currentTimeMillis();
>         sum += sum(ar3, TENGIG, MASK);
>         m2 = System.currentTimeMillis();
>         System.out.println("inline using Integer.hashCode() " + (m2-m1));
>         System.out.println("just to make sure everything executes " + sum);
>     }
> }
>
>
>   ------------------------------
>  *From:* Vitaly Davidovich <vitalyd at gmail.com>
> *To:* Andy Nuss <andrew_nuss at yahoo.com>
> *Cc:* hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> *Sent:* Monday, May 13, 2013 10:20 AM
> *Subject:* Re: performance surprise with Object.hashCode()
>
> Object.hashCode is an intrinsic (see libraryCall.cpp), so it makes sense
> you don't see any difference with your changes.
> Looking at the assembly, when it knows statically that Integer is receiver
> it simply reads the value field inline.  When it doesn't know, it loads the
> address of the receiver's hashCode method and compares against
> Object.hashCode.  If equal, then it proceeds with pulling hash out of
> header.  If it's not, it jumps to a vcall and then jumps again to function
> prologue.  I don't really see why you'd get such a large discrepancy.  The
> only thing is that if hashCode is overridden it jumps around a bit and may
> get icache miss, but that would happen with plain vcall too.  So basically
> only thing "special" (in terms of vcall) I see for hashCode is a check
> against an immediate (for seeing if hashCode is overridden or not), a
> forward (short) jump to skip native object.hashCode impl and that's it.
> Sent from my phone
> On May 13, 2013 11:45 AM, "Andy Nuss" <andrew_nuss at yahoo.com> wrote:
>
> Hi,
>
> I was profiling various aspects of the JVM and hit a big surprise.
>
> * on my corei7, virtual calls are about .5 nanos
> * when a class has not derived a new behavior for hashCode(), this
> hashCode call is 1.5 nanos because native
> * for java.lang.Integer, which just returns the intValue(), hashCode is
> zero time when hotspot can inline
> (that is about one clock cycle when testing hits the same Integer
> instances keeping them in L1 cache)
> * but when you force HotSpot to go thru the vtable for Integer.hashCode,
> the call grows to 4 nanos!
>
> The last case was a big surprise, as I thought for Integer, a vcall to
> hashCode would only cost the 0.5 nanos of the vtable.
>
> Somehow, native code is involved even when hashCode() has been subclassed
> to not be native.
>
> ...
>
> Then I tried mucking with the code in openjdk.  I compiled the sources.  I
> edited Object.java to be this:
>
> public class Object {
>     public int hashCode ()
>     {
>          return System.identityHashCode(this);
>     }
> }
>
> To me, this seems like an ideal fix to this serious performance bug,
> making the entry point NON-native, but having the same effect by default.
> So that if you subclass, you are sure not to pay a doubled(!) native cost.
>
> But changing the source code had no effect on the results.  Nor did it
> have any affect on /share/native/java/lang/Object.c.
>
> In both cases, with and without my change to the definition of
> Object.java, Object.c has NO native function definition for the hashCode
> function.
>
> This leads me to believe that this performance defect is endemic to the
> hotspot compiler code itself, in that it special cases the
> Object.hashCode() function.
>
> It seems that if somehow this performance defect (as I see it) where
> fixed, String hashing and Integer hashing and the like for classes which
> cache their hashvalue would be greatly improved.
>
> ???
>
> Andy
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130513/3df93534/attachment-0001.html