Preliminary RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Sep 24 15:25:36 PDT 2013
Hi Aleksey,
Inline caches has nothing to do with this. You are comparing performance
of inlined (from type profiling) vs not-inlinined call.
We have only 2 intrinsics with virtual dispatch: hashCode() and clone().
clone() may have the same problem you described or not depending on size
of cloned object (so it is not for these changes).
You are correct that for these methods we should try to use type
profiling information first, which may allow to inline.
I would go with you second solution but rename is_low_priority to
does_virtual_dispatch. is_predicted() does not do virtual dispatch so
you don't need to include in factored method (add assert for
protection). And as result you don't need new inline_intrinsic(). Just
cache result in local var:
if (cg->does_virtual_dispatch()) {
cg_intrinsic = cg;
cg = NULL;
}
thanks,
Vladimir
On 9/24/13 7:27 AM, Aleksey Shipilev wrote:
> Anyone not at JavaOne this week? :)
>
> -Aleksey.
>
> On 09/18/2013 09:14 PM, Aleksey Shipilev wrote:
>> Hi,
>>
>> This is the preliminary review for the issue in HS intrinsic handling:
>> https://bugs.openjdk.java.net/browse/JDK-8014447
>>
>> In short, if compiler encounters the expression like this:
>>
>> class C {
>> Object o = new Integer();
>> int m() {
>> return o.hashCode();
>> }
>> }
>>
>> ...then Object.hashCode() intrinsic takes the precedence. The current
>> intrinsic emits the direct Java call to .hashCode() on the slow-path
>> after runtime check the receiver is not exactly Object. In this example,
>> it breaks the inline caches for Integer.
>>
>> The benchmarks clearly showcase the difference between these cases:
>> o_o: Object o = new Object();
>> o_i: Object o = new Integer();
>> i_i: Integer o = new Integer();
>>
>> stat_* call System.identityHashCode(o)
>> virt_* call o.hashCode()
>>
>> Running on Linux x86_64/fastdebug:
>> stat_i_i: 3.75 +- 0.09 ns/op
>> stat_o_i: 3.70 +- 0.05 ns/op
>> stat_o_o: 3.65 +- 0.04 ns/op
>> virt_i_i: 1.58 +- 0.06 ns/op
>> virt_o_i: 8.63 +- 0.04 ns/op // <--- !!!
>> virt_o_o: 4.25 +- 0.03 ns/op
>>
>>
>> Unfortunately, intrinsics already emit the call Node, and it seems too
>> late to make the inline cache for it. So, I have two solutions, both are
>> arguably ugly:
>>
>> a) Special-case hashCode intrinsic, and see if type profile thinks the
>> receiver is exactly j.l.Object, otherwise let the usual inlining code to
>> produce the inline cache. The sample webrev:
>> http://cr.openjdk.java.net/~shade/8014447/webrev.00/
>>
>> Running on Linux x86_64/fastdebug:
>> stat_i_i: 3.75 +- 0.07 ns/op
>> stat_o_i: 3.72 +- 0.07 ns/op
>> stat_o_o: 3.72 +- 0.09 ns/op
>> virt_i_i: 1.53 +- 0.02 ns/op
>> virt_o_i: 1.88 +- 0.02 ns/op (3.5x improvement)
>> virt_o_o: 4.24 +- 0.03 ns/op
>>
>>
>> b) Mark the hashCode intrinsic as low-priority, asking to produce the
>> inline caches based on type profile. If no one claimed the method, we
>> retry intrinsic. The sample webrev:
>> http://cr.openjdk.java.net/~shade/8014447/webrev.01/
>>
>> Running on Linux x86_64/fastdebug:
>> stat_i_i: 3.88 +- 0.04 ns/op
>> stat_o_i: 3.89 +- 0.04 ns/op
>> stat_o_o: 3.86 +- 0.04 ns/op
>> virt_i_i: 1.56 +- 0.05 ns/op
>> virt_o_i: 1.87 +- 0.02 ns/op (3.5x improvement)
>> virt_o_o: 3.90 +- 0.04 ns/op
>>
>> Questions to those familiar with the codebase:
>> 1. Which solution is better?
>> 2. Is there a cleaner solution I'm overlooking?
>> 3. "low_priority" -- is there a better name ("late" and "deferred" are
>> already taken, and they are not exactly fitting)?
>>
>> Thanks,
>> -Aleksey.
>>
>
More information about the hotspot-compiler-dev
mailing list