performance surprise with Object.hashCode()

John Rose john.r.rose at oracle.com
Mon May 13 14:31:33 PDT 2013


On May 13, 2013, at 11:46 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:

> On 05/13/2013 10:35 PM, Aleksey Shipilev wrote:
>> Note that the code generated in o_o and o_i cases are structurally
>> indistinguishable, but o_i naturally goes through the slow path. I
>> wonder why we are losing the information about receiver type being the
>> integer in o_i case, and skip the proper devirtualization...
> 
> In fact, I *do* think the Object.hashCode intrinsic plays a trick on us
> here.
> 
> Linux x86_64, JDK 8b88:
> 
> OOB:
>  o.s.g.a.AndyBench.eee:  0.997 +- 0.020 nsec/op
>  o.s.g.a.AndyBench.i_i:  1.288 +- 0.035 nsec/op
>  o.s.g.a.AndyBench.o_o:  2.709 +- 0.129 nsec/op
>  o.s.g.a.AndyBench.o_i:  4.925 +- 0.098 nsec/op
> 
> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_hashCode:
>  o.s.g.a.AndyBench.eee:  0.992 +- 0.012 nsec/op
>  o.s.g.a.AndyBench.i_i:  1.288 +- 0.024 nsec/op
>  o.s.g.a.AndyBench.o_o: 27.888 +- 1.326 nsec/op
>  o.s.g.a.AndyBench.o_i:  1.623 +- 0.033 nsec/op
> 
> See how o_i case gets to perform much better. o_o naturally takes the
> hit with the intrinsic disabled.

Nice use of DisableIntrinsic!

The intrinsic for the "I must dispatch" case of o_i is slower than a monomorphic inline cache to Integer.hashCode.

However, this should not matter.  If a MIC would save the day, then pre-compilation profiling should do even better (allowing inlining).

Something about this code is overcoming the intended effect of UseTypeProfile.  Often it is a lack of warmup.  In this case it may be a bug in C2, if find_intrinsic etc. is used to generate graph before the type profile is used (TypeProfileMajorReceiverPercent etc.).

— John


More information about the hotspot-compiler-dev mailing list