Preliminary RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches

Aleksey Shipilev aleksey.shipilev at oracle.com
Tue Sep 24 07:27:18 PDT 2013


Anyone not at JavaOne this week? :)

-Aleksey.

On 09/18/2013 09:14 PM, Aleksey Shipilev wrote:
> Hi,
> 
> This is the preliminary review for the issue in HS intrinsic handling:
>  https://bugs.openjdk.java.net/browse/JDK-8014447
> 
> In short, if compiler encounters the expression like this:
> 
>   class C {
>     Object o = new Integer();
>     int m() {
>       return o.hashCode();
>     }
>   }
> 
> ...then Object.hashCode() intrinsic takes the precedence. The current
> intrinsic emits the direct Java call to .hashCode() on the slow-path
> after runtime check the receiver is not exactly Object. In this example,
> it breaks the inline caches for Integer.
> 
> The benchmarks clearly showcase the difference between these cases:
>   o_o: Object o = new Object();
>   o_i: Object o = new Integer();
>   i_i: Integer o = new Integer();
> 
> stat_* call System.identityHashCode(o)
> virt_* call o.hashCode()
> 
> Running on Linux x86_64/fastdebug:
>   stat_i_i: 3.75 +- 0.09 ns/op
>   stat_o_i: 3.70 +- 0.05 ns/op
>   stat_o_o: 3.65 +- 0.04 ns/op
>   virt_i_i: 1.58 +- 0.06 ns/op
>   virt_o_i: 8.63 +- 0.04 ns/op // <--- !!!
>   virt_o_o: 4.25 +- 0.03 ns/op
> 
> 
> Unfortunately, intrinsics already emit the call Node, and it seems too
> late to make the inline cache for it. So, I have two solutions, both are
> arguably ugly:
> 
>  a) Special-case hashCode intrinsic, and see if type profile thinks the
> receiver is exactly j.l.Object, otherwise let the usual inlining code to
> produce the inline cache. The sample webrev:
>    http://cr.openjdk.java.net/~shade/8014447/webrev.00/
> 
> Running on Linux x86_64/fastdebug:
>   stat_i_i: 3.75 +- 0.07 ns/op
>   stat_o_i: 3.72 +- 0.07 ns/op
>   stat_o_o: 3.72 +- 0.09 ns/op
>   virt_i_i: 1.53 +- 0.02 ns/op
>   virt_o_i: 1.88 +- 0.02 ns/op (3.5x improvement)
>   virt_o_o: 4.24 +- 0.03 ns/op
> 
> 
>  b) Mark the hashCode intrinsic as low-priority, asking to produce the
> inline caches based on type profile. If no one claimed the method, we
> retry intrinsic. The sample webrev:
>   http://cr.openjdk.java.net/~shade/8014447/webrev.01/
> 
> Running on Linux x86_64/fastdebug:
>   stat_i_i: 3.88 +- 0.04 ns/op
>   stat_o_i: 3.89 +- 0.04 ns/op
>   stat_o_o: 3.86 +- 0.04 ns/op
>   virt_i_i: 1.56 +- 0.05 ns/op
>   virt_o_i: 1.87 +- 0.02 ns/op (3.5x improvement)
>   virt_o_o: 3.90 +- 0.04 ns/op
> 
> Questions to those familiar with the codebase:
>  1. Which solution is better?
>  2. Is there a cleaner solution I'm overlooking?
>  3. "low_priority" -- is there a better name ("late" and "deferred" are
> already taken, and they are not exactly fitting)?
> 
> Thanks,
> -Aleksey.
> 



More information about the hotspot-compiler-dev mailing list