RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches

Wed Sep 25 15:16:32 PDT 2013

Aleksey,

I was thinking that we may use your check from first version:

(receiver_count > 0 && profile.morphism() == 1 &&
  profile.receiver(0)->as_klass()->is_java_lang_Object())

to use hashCode intrinsic without delay because profiling shows only 
one. But on other hand Object::hashCode() is native method which we 
can't inline (that is why we have intrinsic) so current (02) code should 
work. But, please, confirm.

The problem is bimorphic and polymorphic call sites when one of recorded 
types is j.l.O and its percent is significant. You need to use intrinsic 
on the corresponding generated branch where it is used. And it could be 
tricky because call_generator() is called again recursively for each 
branch and we should return intrinsic without delay. I think you need to 
add next check:

   if (!call_does_dispatch && cg->does_virtual_dispatch()) {
     cg_intrinsic = cg;

assuming that call_does_dispatch is always true when call_generator() is 
called first from do_call(). You need to check virtual and static cases.

Thanks,
Vladimir

On 9/25/13 1:55 PM, Aleksey Shipilev wrote:
> On 09/25/2013 02:25 AM, Vladimir Kozlov wrote:
>> I would go with you second solution but rename is_low_priority to
>> does_virtual_dispatch. is_predicted() does not do virtual dispatch so
>> you don't need to include in factored method (add assert for
>> protection). And as result you don't need new inline_intrinsic(). Just
>> cache result in local var:
>>
>> if (cg->does_virtual_dispatch()) {
>>    cg_intrinsic = cg;
>>    cg = NULL;
>> }
>
> Thank you, Vladimir. The updated webrev is here:
>    http://cr.openjdk.java.net/~shade/8014447/webrev.02/
>
> It passes JPRT (almost all testing is done, stuck in the queues), and
> still does the good thing for Object.hashCode():
>
> baseline:
>    HashCodeBench.stat_i_i:  3.7 +- 0.1 ns/op
>    HashCodeBench.stat_o_i:  3.7 +- 0.1 ns/op
>    HashCodeBench.stat_o_o:  3.7 +- 0.1 ns/op
>    HashCodeBench.virt_i_i:  1.5 +- 0.1 ns/op
>    HashCodeBench.virt_o_i:  8.6 +- 0.1 ns/op // <--- !!!
>    HashCodeBench.virt_o_o:  4.2 +- 0.1 ns/op
>
> patched:
>    HashCodeBench.stat_i_i:  3.6 +- 0.1 ns/op
>    HashCodeBench.stat_o_i:  3.6 +- 0.1 ns/op
>    HashCodeBench.stat_o_o:  3.6 +- 0.1 ns/op
>    HashCodeBench.virt_i_i:  1.5 +- 0.1 ns/op
>    HashCodeBench.virt_o_i:  2.0 +- 0.1 ns/op // improvement
>    HashCodeBench.virt_o_o:  3.8 +- 0.1 ns/op
>
> I put does_virtual_dispatch for the clone() intrinsic as well, but
> the effect is nil, since it is hard to call Object.clone() with Object
> as the formal receiver while actually being the subclass. It does not
> degrade the clone() performance though, so I'm inclined to keep it for
> the symmetry, in case anybody finds the trick of invoking it.
>
> baseline:
>    CloneBench.cln:        18.0 +- 0.3 ns/op
>    CloneBench.cln_cln:    23.3 +- 0.5 ns/op
>    CloneBench.obj_cln:    23.0 +- 0.6 ns/op
>    CloneBench.obj_obj:    23.6 +- 0.7 ns/op
>
> patched:
>    CloneBench.cln:        18.0 +- 0.3 ns/op
>    CloneBench.cln_cln:    23.7 +- 0.7 ns/op
>    CloneBench.obj_cln:    23.5 +- 0.7 ns/op
>    CloneBench.obj_obj:    23.6 +- 0.7 ns/op
>
> Thanks,
> -Aleksey.
>