Preliminary RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches
Aleksey Shipilev
aleksey.shipilev at oracle.com
Wed Sep 18 10:14:42 PDT 2013
Hi,
This is the preliminary review for the issue in HS intrinsic handling:
https://bugs.openjdk.java.net/browse/JDK-8014447
In short, if compiler encounters the expression like this:
class C {
Object o = new Integer();
int m() {
return o.hashCode();
}
}
...then Object.hashCode() intrinsic takes the precedence. The current
intrinsic emits the direct Java call to .hashCode() on the slow-path
after runtime check the receiver is not exactly Object. In this example,
it breaks the inline caches for Integer.
The benchmarks clearly showcase the difference between these cases:
o_o: Object o = new Object();
o_i: Object o = new Integer();
i_i: Integer o = new Integer();
stat_* call System.identityHashCode(o)
virt_* call o.hashCode()
Running on Linux x86_64/fastdebug:
stat_i_i: 3.75 +- 0.09 ns/op
stat_o_i: 3.70 +- 0.05 ns/op
stat_o_o: 3.65 +- 0.04 ns/op
virt_i_i: 1.58 +- 0.06 ns/op
virt_o_i: 8.63 +- 0.04 ns/op // <--- !!!
virt_o_o: 4.25 +- 0.03 ns/op
Unfortunately, intrinsics already emit the call Node, and it seems too
late to make the inline cache for it. So, I have two solutions, both are
arguably ugly:
a) Special-case hashCode intrinsic, and see if type profile thinks the
receiver is exactly j.l.Object, otherwise let the usual inlining code to
produce the inline cache. The sample webrev:
http://cr.openjdk.java.net/~shade/8014447/webrev.00/
Running on Linux x86_64/fastdebug:
stat_i_i: 3.75 +- 0.07 ns/op
stat_o_i: 3.72 +- 0.07 ns/op
stat_o_o: 3.72 +- 0.09 ns/op
virt_i_i: 1.53 +- 0.02 ns/op
virt_o_i: 1.88 +- 0.02 ns/op (3.5x improvement)
virt_o_o: 4.24 +- 0.03 ns/op
b) Mark the hashCode intrinsic as low-priority, asking to produce the
inline caches based on type profile. If no one claimed the method, we
retry intrinsic. The sample webrev:
http://cr.openjdk.java.net/~shade/8014447/webrev.01/
Running on Linux x86_64/fastdebug:
stat_i_i: 3.88 +- 0.04 ns/op
stat_o_i: 3.89 +- 0.04 ns/op
stat_o_o: 3.86 +- 0.04 ns/op
virt_i_i: 1.56 +- 0.05 ns/op
virt_o_i: 1.87 +- 0.02 ns/op (3.5x improvement)
virt_o_o: 3.90 +- 0.04 ns/op
Questions to those familiar with the codebase:
1. Which solution is better?
2. Is there a cleaner solution I'm overlooking?
3. "low_priority" -- is there a better name ("late" and "deferred" are
already taken, and they are not exactly fitting)?
Thanks,
-Aleksey.
More information about the hotspot-compiler-dev
mailing list