performance surprise with Object.hashCode()

Mon May 13 10:29:07 PDT 2013

Here's my benchmarking code.  I don't see at all why a virtual call to Integer.hashCode is so much slower than the base line loop skeleton + inline access to member value of Integer (3rd case) + 0.5 nanos of vtable call (measured in a separate benchmark).

To me, this indicates a serious design flaw in so-called intrinsic call on modern fast machines that could be corrected (were it possible) by making hashCode() a normal non-native function that forwards to System.identityHashCode().

...

package test;

public class Test {

    private static final int        LEN = 64;
    private static final int        HALFLEN = LEN/2;
    private static final int        MASK = HALFLEN-1;
    private static final long        TENGIG = 10000000000L;

    private static int sum (Object[] ar, long cnt, int mask)
    {
        int sum = 0;
        for (long i = 0; i < cnt; i++)
            sum += ar[(int)i & mask].hashCode();
        return sum;
    }

    private static int sum (Integer[] ar, long cnt, int mask)
    {
        int sum = 0;
        for (long i = 0; i < cnt; i++)
            sum += ar[(int)i & mask].hashCode();
        return sum;
    }

    public static void main (String[] args)
    {
        Object[] ar1 = new Object[LEN];
        for (int i = 0; i < LEN; i++)
            ar1[i] = new Object();
        Object[] ar2 = new Object[LEN];
        for (int i = 0; i < LEN; i++)
            ar2[i] = i < HALFLEN ? new Integer(i) : new Object();
        Integer[] ar3 = new Integer[LEN];
        for (int i = 0; i < LEN; i++)
            ar3[i] = new Integer(i);

        long m1, m2;
        int sum = 0;

        for (int i = 0; i < 10000; i++)
            sum += sum(ar1, 10000, MASK);
        m1 = System.currentTimeMillis();
        sum += sum(ar1, TENGIG, MASK);
        m2 = System.currentTimeMillis();
        System.out.println("Object.hashCode() " + (m2-m1));

        for (int i = 0; i < 10000; i++)
            sum += sum(ar2, 10000, MASK);
        m1 = System.currentTimeMillis();
        sum += sum(ar2, TENGIG, MASK);
        m2 = System.currentTimeMillis();
        System.out.println("vtable using Integer.hashCode() " + (m2-m1));

        for (int i = 0; i < 10000; i++)
            sum += sum(ar3, 10000, MASK);
        m1 = System.currentTimeMillis();
        sum += sum(ar3, TENGIG, MASK);
        m2 = System.currentTimeMillis();
        System.out.println("inline using Integer.hashCode() " + (m2-m1));
        System.out.println("just to make sure everything executes " + sum);
    }
}

________________________________
 From: Vitaly Davidovich <vitalyd at gmail.com>
To: Andy Nuss <andrew_nuss at yahoo.com> 
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net> 
Sent: Monday, May 13, 2013 10:20 AM
Subject: Re: performance surprise with Object.hashCode()

Object.hashCode is an intrinsic (see libraryCall.cpp), so it makes sense you don't see any difference with your changes.
Looking at the assembly, when it knows statically that Integer is receiver it simply reads the value field inline.  When it doesn't know, it loads the address of the receiver's hashCode method and compares against Object.hashCode.  If equal, then it proceeds with pulling hash out of header.  If it's not, it jumps to a vcall and then jumps again to function prologue.  I don't really see why you'd get such a large discrepancy.  The only thing is that if hashCode is overridden it jumps around a bit and may get icache miss, but that would happen with plain vcall too.  So basically only thing "special" (in terms of vcall) I see for hashCode is a check against an immediate (for seeing if hashCode is overridden or not), a forward (short) jump to skip native object.hashCode impl and that's it.
Sent from my phone
On May 13, 2013 11:45 AM, "Andy Nuss" <andrew_nuss at yahoo.com> wrote:

Hi,
>
>
>I was profiling various aspects of the JVM and hit a big surprise.
>
>
>* on my corei7, virtual calls are about .5 nanos
>* when a class has not derived a new behavior for hashCode(), this hashCode call is 1.5 nanos because native
>* for java.lang.Integer, which just returns the intValue(), hashCode is zero time when hotspot can inline
>(that is about one clock cycle when testing hits the same Integer instances keeping them in L1 cache)
>
>* but when you force HotSpot to go thru the vtable for Integer.hashCode, the call grows to 4 nanos!
>
>
>The last case was a big surprise, as I thought for Integer, a vcall to hashCode would only cost the 0.5 nanos of the vtable.
>
>
>Somehow, native code is involved even when hashCode() has been subclassed to not be native.
>
>
>
>...
>
>
>Then I tried mucking with the code in openjdk.  I compiled the sources.  I edited Object.java to be this:
>
>
>public class Object {
>    public int hashCode ()
>    {
>         return System.identityHashCode(this);
>    }
>}
>
>
>To me, this seems like an ideal fix to this serious performance bug, making the entry point NON-native, but having the same effect by default.  So that if you subclass, you are sure not to pay a doubled(!) native cost.
>
>
>But changing the source code had no effect on the results.  Nor did it have any affect on /share/native/java/lang/Object.c.
>
>
>In both cases, with and without my change to the definition of Object.java, Object.c has NO native function definition for the hashCode function.
>
>
>This leads me to believe that this performance defect is endemic to the hotspot compiler code itself, in that it special cases the Object.hashCode() function.
>
>
>It seems that if somehow this performance defect (as I see it) where fixed, String hashing and Integer hashing and the like for classes which cache their hashvalue would be greatly improved.
>
>
>???
>
>
>Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130513/061db283/attachment.html