RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Sep 25 15:16:32 PDT 2013
Aleksey,
I was thinking that we may use your check from first version:
(receiver_count > 0 && profile.morphism() == 1 &&
profile.receiver(0)->as_klass()->is_java_lang_Object())
to use hashCode intrinsic without delay because profiling shows only
one. But on other hand Object::hashCode() is native method which we
can't inline (that is why we have intrinsic) so current (02) code should
work. But, please, confirm.
The problem is bimorphic and polymorphic call sites when one of recorded
types is j.l.O and its percent is significant. You need to use intrinsic
on the corresponding generated branch where it is used. And it could be
tricky because call_generator() is called again recursively for each
branch and we should return intrinsic without delay. I think you need to
add next check:
if (!call_does_dispatch && cg->does_virtual_dispatch()) {
cg_intrinsic = cg;
assuming that call_does_dispatch is always true when call_generator() is
called first from do_call(). You need to check virtual and static cases.
Thanks,
Vladimir
On 9/25/13 1:55 PM, Aleksey Shipilev wrote:
> On 09/25/2013 02:25 AM, Vladimir Kozlov wrote:
>> I would go with you second solution but rename is_low_priority to
>> does_virtual_dispatch. is_predicted() does not do virtual dispatch so
>> you don't need to include in factored method (add assert for
>> protection). And as result you don't need new inline_intrinsic(). Just
>> cache result in local var:
>>
>> if (cg->does_virtual_dispatch()) {
>> cg_intrinsic = cg;
>> cg = NULL;
>> }
>
> Thank you, Vladimir. The updated webrev is here:
> http://cr.openjdk.java.net/~shade/8014447/webrev.02/
>
> It passes JPRT (almost all testing is done, stuck in the queues), and
> still does the good thing for Object.hashCode():
>
> baseline:
> HashCodeBench.stat_i_i: 3.7 +- 0.1 ns/op
> HashCodeBench.stat_o_i: 3.7 +- 0.1 ns/op
> HashCodeBench.stat_o_o: 3.7 +- 0.1 ns/op
> HashCodeBench.virt_i_i: 1.5 +- 0.1 ns/op
> HashCodeBench.virt_o_i: 8.6 +- 0.1 ns/op // <--- !!!
> HashCodeBench.virt_o_o: 4.2 +- 0.1 ns/op
>
> patched:
> HashCodeBench.stat_i_i: 3.6 +- 0.1 ns/op
> HashCodeBench.stat_o_i: 3.6 +- 0.1 ns/op
> HashCodeBench.stat_o_o: 3.6 +- 0.1 ns/op
> HashCodeBench.virt_i_i: 1.5 +- 0.1 ns/op
> HashCodeBench.virt_o_i: 2.0 +- 0.1 ns/op // improvement
> HashCodeBench.virt_o_o: 3.8 +- 0.1 ns/op
>
> I put does_virtual_dispatch for the clone() intrinsic as well, but
> the effect is nil, since it is hard to call Object.clone() with Object
> as the formal receiver while actually being the subclass. It does not
> degrade the clone() performance though, so I'm inclined to keep it for
> the symmetry, in case anybody finds the trick of invoking it.
>
> baseline:
> CloneBench.cln: 18.0 +- 0.3 ns/op
> CloneBench.cln_cln: 23.3 +- 0.5 ns/op
> CloneBench.obj_cln: 23.0 +- 0.6 ns/op
> CloneBench.obj_obj: 23.6 +- 0.7 ns/op
>
> patched:
> CloneBench.cln: 18.0 +- 0.3 ns/op
> CloneBench.cln_cln: 23.7 +- 0.7 ns/op
> CloneBench.obj_cln: 23.5 +- 0.7 ns/op
> CloneBench.obj_obj: 23.6 +- 0.7 ns/op
>
> Thanks,
> -Aleksey.
>
More information about the hotspot-compiler-dev
mailing list