Preliminary RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches
Aleksey Shipilev
aleksey.shipilev at oracle.com
Tue Sep 24 07:27:18 PDT 2013
Anyone not at JavaOne this week? :)
-Aleksey.
On 09/18/2013 09:14 PM, Aleksey Shipilev wrote:
> Hi,
>
> This is the preliminary review for the issue in HS intrinsic handling:
> https://bugs.openjdk.java.net/browse/JDK-8014447
>
> In short, if compiler encounters the expression like this:
>
> class C {
> Object o = new Integer();
> int m() {
> return o.hashCode();
> }
> }
>
> ...then Object.hashCode() intrinsic takes the precedence. The current
> intrinsic emits the direct Java call to .hashCode() on the slow-path
> after runtime check the receiver is not exactly Object. In this example,
> it breaks the inline caches for Integer.
>
> The benchmarks clearly showcase the difference between these cases:
> o_o: Object o = new Object();
> o_i: Object o = new Integer();
> i_i: Integer o = new Integer();
>
> stat_* call System.identityHashCode(o)
> virt_* call o.hashCode()
>
> Running on Linux x86_64/fastdebug:
> stat_i_i: 3.75 +- 0.09 ns/op
> stat_o_i: 3.70 +- 0.05 ns/op
> stat_o_o: 3.65 +- 0.04 ns/op
> virt_i_i: 1.58 +- 0.06 ns/op
> virt_o_i: 8.63 +- 0.04 ns/op // <--- !!!
> virt_o_o: 4.25 +- 0.03 ns/op
>
>
> Unfortunately, intrinsics already emit the call Node, and it seems too
> late to make the inline cache for it. So, I have two solutions, both are
> arguably ugly:
>
> a) Special-case hashCode intrinsic, and see if type profile thinks the
> receiver is exactly j.l.Object, otherwise let the usual inlining code to
> produce the inline cache. The sample webrev:
> http://cr.openjdk.java.net/~shade/8014447/webrev.00/
>
> Running on Linux x86_64/fastdebug:
> stat_i_i: 3.75 +- 0.07 ns/op
> stat_o_i: 3.72 +- 0.07 ns/op
> stat_o_o: 3.72 +- 0.09 ns/op
> virt_i_i: 1.53 +- 0.02 ns/op
> virt_o_i: 1.88 +- 0.02 ns/op (3.5x improvement)
> virt_o_o: 4.24 +- 0.03 ns/op
>
>
> b) Mark the hashCode intrinsic as low-priority, asking to produce the
> inline caches based on type profile. If no one claimed the method, we
> retry intrinsic. The sample webrev:
> http://cr.openjdk.java.net/~shade/8014447/webrev.01/
>
> Running on Linux x86_64/fastdebug:
> stat_i_i: 3.88 +- 0.04 ns/op
> stat_o_i: 3.89 +- 0.04 ns/op
> stat_o_o: 3.86 +- 0.04 ns/op
> virt_i_i: 1.56 +- 0.05 ns/op
> virt_o_i: 1.87 +- 0.02 ns/op (3.5x improvement)
> virt_o_o: 3.90 +- 0.04 ns/op
>
> Questions to those familiar with the codebase:
> 1. Which solution is better?
> 2. Is there a cleaner solution I'm overlooking?
> 3. "low_priority" -- is there a better name ("late" and "deferred" are
> already taken, and they are not exactly fitting)?
>
> Thanks,
> -Aleksey.
>
More information about the hotspot-compiler-dev
mailing list