Preliminary RFR (S): CR 8014447: Object.hashCode intrinsic breaks inline caches

Wed Sep 18 10:14:42 PDT 2013

Hi,

This is the preliminary review for the issue in HS intrinsic handling:
 https://bugs.openjdk.java.net/browse/JDK-8014447

In short, if compiler encounters the expression like this:

  class C {
    Object o = new Integer();
    int m() {
      return o.hashCode();
    }
  }

...then Object.hashCode() intrinsic takes the precedence. The current
intrinsic emits the direct Java call to .hashCode() on the slow-path
after runtime check the receiver is not exactly Object. In this example,
it breaks the inline caches for Integer.

The benchmarks clearly showcase the difference between these cases:
  o_o: Object o = new Object();
  o_i: Object o = new Integer();
  i_i: Integer o = new Integer();

stat_* call System.identityHashCode(o)
virt_* call o.hashCode()

Running on Linux x86_64/fastdebug:
  stat_i_i: 3.75 +- 0.09 ns/op
  stat_o_i: 3.70 +- 0.05 ns/op
  stat_o_o: 3.65 +- 0.04 ns/op
  virt_i_i: 1.58 +- 0.06 ns/op
  virt_o_i: 8.63 +- 0.04 ns/op // <--- !!!
  virt_o_o: 4.25 +- 0.03 ns/op

Unfortunately, intrinsics already emit the call Node, and it seems too
late to make the inline cache for it. So, I have two solutions, both are
arguably ugly:

 a) Special-case hashCode intrinsic, and see if type profile thinks the
receiver is exactly j.l.Object, otherwise let the usual inlining code to
produce the inline cache. The sample webrev:
   http://cr.openjdk.java.net/~shade/8014447/webrev.00/

Running on Linux x86_64/fastdebug:
  stat_i_i: 3.75 +- 0.07 ns/op
  stat_o_i: 3.72 +- 0.07 ns/op
  stat_o_o: 3.72 +- 0.09 ns/op
  virt_i_i: 1.53 +- 0.02 ns/op
  virt_o_i: 1.88 +- 0.02 ns/op (3.5x improvement)
  virt_o_o: 4.24 +- 0.03 ns/op

 b) Mark the hashCode intrinsic as low-priority, asking to produce the
inline caches based on type profile. If no one claimed the method, we
retry intrinsic. The sample webrev:
  http://cr.openjdk.java.net/~shade/8014447/webrev.01/

Running on Linux x86_64/fastdebug:
  stat_i_i: 3.88 +- 0.04 ns/op
  stat_o_i: 3.89 +- 0.04 ns/op
  stat_o_o: 3.86 +- 0.04 ns/op
  virt_i_i: 1.56 +- 0.05 ns/op
  virt_o_i: 1.87 +- 0.02 ns/op (3.5x improvement)
  virt_o_o: 3.90 +- 0.04 ns/op

Questions to those familiar with the codebase:
 1. Which solution is better?
 2. Is there a cleaner solution I'm overlooking?
 3. "low_priority" -- is there a better name ("late" and "deferred" are
already taken, and they are not exactly fitting)?

Thanks,
-Aleksey.